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1 .  Introduction 

This  paper  describes  several  general-purpose  data-flow 
analysis  algorithms  that  have  been  designed  and  implemented  for 
the  SETL  optimizer.   These  algorithms  include  interval  analysis, 
interprocedural  and  intraprocedural  'forward'  and  'backward'  data- 
flow analysis  for  ' bitvectoring '  problems  and  code  motion.   Most 
of  these  algorithms  use  new  techniques  which  improve  their 
performance  significantly  as  compared  with  traditional  methods. 

Although  these  algorithms  reflect  the  special  SETL  semantic 
environment,  they  do  so  only  to  a  limited  extent,  and  can  there- 
fore support  a  variety  of  optimizations  for  most  programming 
languages  and  systems.   In  the  SETL  optimizer  they  support  about 
half  a  dozen  optimizations,  including  classical  ones,  such  as 
redundant  expressions  elimination,  live  variables  analysis,  and 
reaching  definitions  analysis,  and  also  including  some  special 
SETL  optimizations,  such  as  copy  elimination  and  copy  motion 
and  elimination  of  data-conversions. 

Our  algorithms  operate  on  an  intermediate-level  representation 
of  the  program  to  be  analysed,  in  which  code  is  partitioned 
into  basic  blocks  organized  as  a  flow-graph  (see  next  section 
for  a  detailed  summary  of  terminology  and  notations) .   In  pre- 
paring to  apply  these  algorithms,  we  first  perform  a  simple 
analysis  of  the  interprocedural  call  pattern  of  the  program  to 
be  optimized.   This  builds  up  data-structures  which  are  used 
later  in  interprocedural  data-flow  analysis.  Then  the  interval 
structure  of  each  subprocedure  is  analyzed,  using  a  variant  of 
an  algorithm  of  Tarjan,  which  produces  a  rather  compact  interval 
representation.   Our  variant  of  Tarjan 's  algorithm  also  handles 
irreducible  flow  graphs  in  a  simple  and  efficient  manner,  and 
prepares  for  later  code  motion  algorithms. 

After  these  preliminary  analyses  have  been  carried  out, 
four  interval-based  data-flow  algorithms  for  problems  of  the 
'bitvectoring'  type  become  applicable. 
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These  algorithms  solve  'forward'  problems  (such  as  available 
expressions  analysis),  and  'backward'  problems  (such  as  live 
variables  analysis),  either  intraprocedurally  or  interprocedurally , 
The  ' forward ' algorithms  also  support  code  motion.   The  structure 
of  the  interprocedural  algorithms  reflects  the  call-by-value 
semantics  of  SETL,  and  would  therefore  have  to  be  modified  for 
languages  allowing  parameters  passed  by  reference.   However,  our 
interprocedural  approach  is  simple,  efficient,  and  yields  sharp 
results.   Attractive  bounds  on  its  performance  can  be  proved. 

This  paper  is  organized  as  follows:   Section  2  introduces 
relevant  notations  and  terminology.   Section  3  describes  our 
interval  analysis  algorithm  and  intraprocedural  'forward' 
data-flow  algorithm.   Section  4  presents  the  interprocedural 
'forward'  data-flow  algorithm  and  analyzes  its  performance. 
'Backward'  data-flow  algorithms  are  discussed  in  section  5 
(in  the  intraprocedural  case)  and  5  (in  the  interprocedural  case) . 
Section  7  describes  a  code  motion  algorithm  as  an  extension  of 
the  forward  data-flow  algorithms  described  earlier.   Section  8 
discusses  some  possible  extensions  of  bitvectoring  data-flow 
problems,  estimates  their  complexity  and  assesses  the  feasibility 
of  adapting  our  methods  for  these  extended  frameworks .   Section 
9  describes  the  application  of  the  general  algorithms  described 
in  previous  sections  to  specific  data-flow  problems  in  the 
SETL  optimizer. 

SETL  code  for  the  various  algorithms  described  in  this 
paper  is  given  in  Appendix  A.  (The  actual  SETL  optimizer  is 
written  in  SETL,  and  this  code  is  extracted  from  it.) 
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2.  Terminology  and  Notations 

In  this  section  we  outline  the  basic  notations  and  terminology 
to  be  used  in  this  paper.   More  information  concerning  standard 
terminology  can  be  found  e.g.  in  [He]  and  [AU] . 

The  program  to  be  analyzed  is  assiimed  to  have  been  translated 
into  intermediate-level  code,  which  is  partitioned  into  extended 
basic  blocks,  which  are  single-entry  multi-exit  code  sequences 
(containing  no  internal  branches) .   For  purposes  of  interprocedural 
analysis,  we  assiime  that  each  procedure  call  instruction  const- 
itutes a   single-instruction  basic  block.   Moreover,  in  the 
interprocedural  case,  each  procedure  p  is  assumed  to  have  a  unique 
entry  block,  denoted  by   r   , and  a  unique  exit  block,  denoted  by 

e    which  is  also  assumed  to  be  a  single-instruction  block. 
p  ' 

Optionally,  p  may  also  contain  a  stop  block,  denoted  by  s   , 

which  terminates  execution  completely  when  entered  (whereas  the 

exit  block  returns  to  the  point  from  which  p  has  been  called) . 

We  also  assume  that  p's  entry  block  r   is  not  contained  in 

p 

any  loop  within  p.   We  assume  that  program  execution  always 
starts  at  a  unique  procedure,  called  the  main  program  and  denoted 
as  main,  which  is  not  recursive. 

In  SETL,  procedure  parameters  are  passed  by  value.   Value 
transmission  between  actual  arguments  and  formal  parameters 
at  a  procedure  call  is  assumed  to  be  represented  in  the  inter- 
mediate-level code  by  explicit  assignment-like  argument-passing 
instructions  which  occur  before  and  after  each  call .   In  the 
initial  form  of  our  algorithms,  we  treat  these  special  assign- 
ments as  ordinary  assignments,  independent  of  the  relevant 
procedure  call,  and  so  ignore  some  of  the  tricky  issues  connected 
with  parameter-passing,  such  as  recursive  stacking  and  unstacking. 
(Some  ammendments  to  this  approach  are  discussed  in  section  4.) 
This  allows  us  to  work  with  a  program  model  in  which  procedures 
are  parameterless ,  and  in  which  procedures  communicate  only 
via  global  variables.   As  in  SETL,  we  disallow  procedure  variables. 
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To  begin  our  analysis,  we  create  a  flow-graph   G   for 

each  procedure  p  in  the  program  being  analyzed.   G   is  a  rooted 

directed  graph  whose  nodes  are  the  basic  blocks  of  p,  whose 

root  is  r  ,  and  whose  edges  are  of  the  form  (m,n)  where  m,n 

P 
are  basic  blocks  in  p,  and  either  n  follows  m  immediately  in 

the  code,  or  else  m  contains  a  branch  instruction  to   (the 

start  of  )  n.   We  assiime  that  each  node  in  G   is  reachable  from 

its  root.   The  flow  graph  G  for  the  whole  program  is  then  simply 

taken  to  be  the  union  of  the  flow  graphs  of  all  its  procedures. 

The  call  graph   CG  of  the  program  is  another  rooted 
directed  graph  whose  nodes  are  the  program's  procedures,  whose 
root  is  the  main  program  main,  and  whose  edges  are  of  the  form 
(p.q)  where  p,q  are  procedures  and  p  contains  a  call  to  q. 
Again  we  assume  for  simplicity  that  each  node  in  CG  is  reachable 
from  its  root.   Note  that  CG  is  acyclic  iff  the  program  being 
analyzed  is  non-recursive. 

For  any  directed  graph  G  and  a  depth-first  spanning  tree 
T  for  G,  we  define  the  loop-connectedness  parameter    d  =  d(G,T) 
of  G  with  respect  to  T   as  the  maximal  number   of  back-edges 
(relative  to  T)  lying  along  an  acyclic  path  in  G.   Some  properties 
of  this  parameter  are  discussed  in  [He];  see  also  [KU], 

Our  next  step  is  to  apply  interval  analysis.   Here,  we 
analyze  the  loop-structure  of  each  procedure  flow  graph  G  .  An  interval 
1  with  a  given  entry  node  in  such  a  graph  is  required  to  be  a 
single-entry  strongly-connected  set  of  nodes  of  G  ,  having  that 

ir 

node  as  its   entry   node.  This  definition  differs  in  a 

significant  detail  from  the  more  standard  Allen-Cocke  definition 
of  an  interval  [Al, ]  and  also  from  the  definition  of  an  interval 
used  by  Tarjan  [Ta].   We  build  intervals  by  a  technique   due 
to    Tarjan,  which  detects  intervals  from  innermost  to  outermost, 
and  reduces  each  such  interval  to  a  new  single  basic  block. 
As  it  proceeds,  our  interval  analysis  algorithm  also  classifies 
each  interval  as  being  proper  (meaning  that  it  is  an  interval 
in  the  classical  sense,  i.e.  has  the  property  that  each  internal 
cycle  within  it  contains  the  entry  node)  or  else  improper. 
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in  which  case  it  is  an  irreducible  subgraph. 

The  preceding  remarks  outline  our  approach  to  control-flow, 
and  we  now  turn  to  consider  data-flow.   In  this  paper  we  consider 
only  data-flow  problems  of  the  bitvectorinq  class,  which  are  the 
simplest  data-flow  problems  that  arise  in  global  program 
optimization.   Such  data-flow  problems  are  described  by  a  data- 
flow framework  (L,F)  (cf . [He] , [AU] ) ,  for  which  there  exists  a 
finite  set  E  such  that  L  =  2   and  for  each  f  e  F  there  exist  two 
subsets  A-  ^   B_  of  E  such  that  for  each  x  e  L 

f(x)  =  (A^  n  x)  u  B^. 

Heuristically,  elements  of  L  represent  (Boolean)  attribute  values 
(such  as  availability  of  expressions,  live  status  of  variables, 
etc.),  to  be  computed  at  certain  program  points,  and  elements  of 
F  represent  transformations  of  these  values  effected  by  program 
execution.   The  special  structure  of  the  maps  belonging  to  F  allows 
each  f  in  F  to  be  represented  compactly  as  an  element  (A^  ,B_)  of  LxL, 
and  allows  functional  application,  composition  and  meet  (re- 
presented as  set  intersection,  see  (c)  below)  to  be  performed 
rapidly,  using  bit-vector   and  and  or  operations. 
The  following  facts  are  standard: 

(a)  f(x  A  y)  =  f(x)  A  f(y),  for  each  x,y  e  L  and  f  e  F 

(distributivity); 

2 

(b)  f   =  f,  for  each  f  e  F  (idempotency); 

(c)  For  each  f,  g  e  F,  let  g»f  denote  the  functional  composition 
of  g  and  f,  and   g  A  f  denote  the  functional  (pointwise) 
meet  of  these  maps.   Then 

(A  B  )     =     (A    HA      UB      ,  A    nB       UB    ) 

g«f'g«f  gf       ggf       q 


(A    ,^  ,  B    .J    =    (A  n   A^      B   r^Bj   , 
gAf  '     gAf  g         f  '      g      f 
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so  that  F  is  closed  under  these  functional  operations. 
(d)    The  identity  map  id  on  L  also  belongs  to  F  and  we  have 

(^id'   ^id)  =  (^'  ^^ 

For  the  sake  of  convenience  we  extend  the  framework  (L,  F) , 
by  introducing  a  special  new  and  largest  element   f2  e  L,  denoting 
an  undefined  data  value,  and  a  new  function   f„   eF,  denoting 
an  undefined  flow,  which  maps  L  into  {^} .      All  other  functions  f 
are  extended  so  that  f(fi)  -   Q.       (interpreting  elements  of  L  as 
predicates  on  the  program  states,  Q.   corresponds  to  the  predicate 
false;   f„  describes  the  effect  of  executing  an  ' abort' statement . ) 

Data-flow  analysis  problems  can  be  either  'forward'  analyses 
(like  available  expressions  analysis) ,  in  which  attributes  are 
computed  by  tracing  execution  flow  in  a  forward  direction,  from 
program  (or   subprocedure)  entry  up  to  any  given  program  point, 
or  'backward'  analyses  (like  live-variables  analysis),  in  which 
attributes  are  computed  by  tracing  execution  flow  in  a  backward 
direction,  from  program  (or  subprocedure)  exits  back  to  any 
given  program  point.   Also,  each  analysis  can  either  be  performed 
intraprocedurally,  i.e.  separately  for  each  procedure,  to  gather 
information  about  the  behavior  of  its  local  variables,  or  inter- 
procedurall v .   In  interprocedural  analysis  a  program  is  analyzed 
as  a  whole,  to  gather  information  about  glpbal  variables  and 
procedure  parameters.   This  paper  describes  algorithms  for  all 
combinations  of  these  classes  of  data-flow  analyses. 

Consider  first  intraprocedural  forward  analysis.   Suppose 

that  we  are  given  a  flow-graph  G   of  some  procedure  p.   Knowing 

the  semantics  of  the  operations  within  each  basic  block,  we  can 

associate  a  data-flow  map   f ,    x   e  F  with  each  edge  (m,n)  e  G  . 
*-    (m,  n;  P 

This  map  describes  the  change  in  data  as  control  advances  from 
the  start  of  m,  through  m,  to  the  start  of  n.   (Note  that  we 
assume  here  that  the  effect  of  each  procedure  call  within  p  is 
already  known,  which  is  the  case,  e.g.,  if  our  analysis 
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deals  only  with  local  variables  of  p,  which  are  unaffected  by- 
such  a  call.)   The  data-flow  problem  we  wish  to  solve  can  be 
formulated  in  terms  of  these  functions.   Specifically,  for  each 
node  n  in  p,  let  x   e  L  denote  the  data-value  at  entry  to  the 

basic  block  n,  and  let  x   e  L  denote  (worst-case)  attribute 

o 

information  assumed  at  entry  to  p.   Then  we  need  to  solve  the 
following  standard  set  of  data-flow  equations: 


X     =  X 

r     o 
P 


(2.1) 


X   =  A  if,         .  (x  )  :  (m,n)  £  G  }  for  each  node  n  7^  r   in  p, 
n        (m,  n)    m  p  P 


or,  more  precisely,  to  compute  the  maximal  fixpoint  of  these 
equations . 

A  variety  of  algorithms  for  obtaining  this  fixpoint  are 
known  (cf .  [He],  [AU]  for  a  survey)  .   In  this  paper  we  use  an 
interval-based  el Imination   algorithm,  which  is  described  in 
section  3,  and  which  resembles  other  such  algorithms  (such  as 
in  [AC],  for  example),  but  differs  from  them  in  significant 
details  such  as  the  way  in  which  irreducible  flow  graphs  are 
handled. 

Similar  equations  which  relate  data  at  each  node  to  data 
at  its  successors  rather  than  its  predecessors   can  be  used  to 
describe  intraprocedural  backward  analysis.   However,  certain 
significant  differences  between  backward  and  forward  analysis 
require  careful  formulation  and  treatment.   Section  5  discusses 
these  issues  and  describes  an  intraprocedural  interval-based 
elimination  algorithm  for  the  solution  of  backward  data-flow 
problems . 

The  next  kind  of  analysis  that  we  consider  is  interprocedural 
forward  analysis.   Two  main  difficulties  must  be  overcome  in 
adapting  the  intraprocedural  approach  sketched  above  to  the 
interprocedural  case.   First,  we  cannot  assign  data-flow  maps 
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a   priori    to  edges  (m,n),  where  m  is  a  call  block,  since  initially 
the  effect  of  call  blocks  is  not  known.   Also,  we  have  to  de- 
termine an  attribute  value  x    at  entry  to  each  procedure  p. 
In  the  intraprocedural  case    x    is  generally  chosen  to  re- 
flect worst-case  assumptions     -^  concerning  data  at  entry  to 
p,  but  to  retain  accuracy  in  the  interprocedural  case  we  will 
only  want  to  make  such  assumptions  at  entry  to  the  main  program. 

The  problems  just  noted  are  analyzed  systematically  from 
a  theoretical  point  of  view  in   [SP,  section  3].   There,  the 
preceding  data-flow  equations  (2.1)  are  reformulated  as  equations 
involving  data-flow  functions  <^    ,    where,  for  each  node  n  in  a 
procedure  p   (j)   denotes  the  effect,  on  the  attributes  we  wish 
to  calculate,  of  the  advance  of  control  from  entry  to  p  to  the 
start  of  n  along  all  interprocedurally  valid  and  balanced  paths 
(i.e.  execution  paths  in  which  each  procedure  call  is  properly 
terminated) .   These  equations  are 


(2.2) 


where 


(J)  =  id 

^r  — 
P 

*n  =  ^  ^^(m,n)  °  *m   =  ^"^' "^  ^  ^p^'    "  ^  ^p  ^^  P 


(f ,    V  ,   if  m  is  not  a  call  block 
(m,n) 
■      I  (J)   ,  if  m  is  a  call  block   calling  procedure  q, 


It  is  shown  in  [SP]  that  a  (maximal  fixpoint)  solution  of 
these  equations  exists  if  L  is  finite  (which  is  the  case  for 
bitvectoring  frameworks) ,  and  can  be  found  by  methods  similar 
to  those  available  in  the  intraprocedural  case.   It  is  also 
shown  there  that  this  solution  coincides  with  a  'meet  over  all 
paths'  solution.   [SP]  discusses  an  iterative   technique  for 
finding  that  solution;  in  section  4  we  will  present   a 
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more  efficient  interval-based  elimination  technique.   It  should 

be  emphasized  again  that  elimination  techniques  are  most  appropriate 

for  bitvectoring  frameworks,  in   which  functional  compositions 

and  meets  are  almost  as  easy  and  fast  to  perform  as  functional 

applications. 

Once  the  solution  of  (2.2)  has  been  found,  we  can  compute  data 

at  procedure  entries  by  solving  the  following  data-flow  equations 

(where  x   denotes  data  at  entry  to  the  procedure  p,  and  where 

X  €  L   ^  denotes  attributes  to  be  assumed  at  the  start  of 
o 

execution;  see  [SP,  Equations  3-3]): 


(2.3) 


X       =  x 

r         o 

mam 

x     =   A  {(1)   (x   )  :  c  is  a  call  to  q  from  p} 
r  c   r 

q  P 


An  iterative  solution  technique  for  these  equations  is 
discussed  in  section  4.   Finally,  applying  the  maps   (j)^^  to  the 

entry  data  x   yields  data  at  entry  to  any  program  basic  block. 

^P 
A  similar  approach  can  be  used  for  interprocedural  backward 

analysis .   Specifically,  for  each  basic  block  n  let  ^^   denote 

the  data-flow  map  describing  the  effect  on  the  attributes  that 

we  wish  to  compute  of  the  advance  of  control  from  the  start  of 

n  to  the  exit  e   of  the  procedure  p  containing  n,  along  execution 

paths  in  which  each  procedure  call  is  properly  terminated,  but 

including  also  incomplete  paths  which  terminate  at  some 

stop  block.   These  maps  satisfy  the  following  equations  (where 

X  G  L  denotes  worst-case  attributes  assumed  at  program  exits) : 
o 


(2.4) 


Tb  =  X    (a  constant  map)  ,  for  each  p  ; 

s      o 
P 

}b  =  id^  ,   for  each  p  ; 

'^e     — L 
P 

^      =   A    {h,    .    °  ^         : (m,n)  G  G  },  for  all  other  m  in  p 
m        (m,n;    n  p 
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where 


(m,n) 


f ,    X  ,  if  m  is  not  a  call  block 
(m,  n) 


.  i(j   ,  if  m  is  a  call  block   calling  procedure  q  . 


Solution  methods  for  these  equations,  as  well  as  formulation  and 
solution  methods  for  equations  analogous  to  (2.3),  are  discussed 
in  section  6. 


3.  A  Modified  Interval  Analysis  Technique 

In  this  section  we  sketch  a  modified  approach  to  interval 
analysis,  based  primarily  on  Tarjan's  interval  analysis  technique 
[Ta],  but  pragmatically  adapted.   This  approach  also  handles 
irreducible  flow  graphs  in  a  reasonably  simple  and  efficient 
manner. 

The   classic     interval  analysis  technique  of  Allen  and 
Cocke  (cf .  [AC]) ,  builds  up  a  sequence  of  derived  graphs  for  a 
given  flow  graph.   Each  such  graph  results  from  the  previous  one 
by  simultaneously  reducing  all  first-order  intervals  to  single 
nodes.   This  has  various  disadvantages,  e.g.  the  nodes  in  one 
derived  graph,  even  if  unaffected  by  the  reduction  of  this 
graph,  are  duplicated  in  the  next  derived  graph. 

Moreover,  in  this  traditional  approach,  intervals  are 
not  required  to  be  strongly  connected,  which  creates  extra 
complications  for  code  motion,  where  normally  we  only  wish  to  jnove 
code  lying  in  the  strongly-connected  part  of  an  interval  I  out  of  I 

A  third  potential  disadvantage  lies  in  the  handling  of 
irreducible  flow  graphs, which  requires  special  'node-splitting' 
mechanisms  (see  [He]). 

A  different   and  potentially  more  efficient  approach  to 
interval  analysis   which  relieves  some  of  these  objections. 
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has  been  suggested  by  Tarjan  [Ta],  and  will  be  followed  and 
adapted  here.   The  relevant  theory   is     suppressed  in  our 
description.   It  can  be  found  in  [Ta];  additional  exposition 
of  this  theory  can  be  found  in  [SS],  whose  notation  we  shall 
closely  follow.   A  formal  SETL  code  for  our  interval  analysis 
algorithm  is  given  in  Appendix  A. 

Let  G  =  G   be  a  given  (intraprocedural)  flow-graph  having 
an  entry  node  r,  and  let  T  be  a  depth-first  spanning  tree  for  G. 
We  assume   for  simplicity   that  r  is  not  a  target  of  a  back 
edge  with  respect  to  T.   Among  other  objects,  our  algorithm  will 
compute  a  map  ' intof '   which  maps  each  node  in  a  (strongly- 
connected)  interval  to  the  interval  itself.   Each  interval  will  be  re- 
presented by  a  new  flow-graph  node,  logically  placed  in  G  and 
T  'just  before'  the  first  node  of  the  interval  (see  below  for 
more  details) .    A  crude  sketch  of  our  interval  analysis  algorithm 
is  as  follows: 

(1)  Initialize  intof  to  the  null  map.   Mark  all  nodes  as 
'proper'  (meaning,  heuristically,  that  they  are  not  (as  yet) 

heads  of  multiple-entry  loops).   Also  initialize  a  set  ' proper-ints ' 
to  the  null  set  and  a  tuple  'intervals'  to  the  null  tuple. 

(2)  Iterate  in  reverse  preorder  (of  the  tree  T)  through  all 
nodes  which  are  back-edge  targets. 

(3)  For  each  such  node  x  compute  the  set 

00 

reachunder (x)  =  {intof   (y)  :  x  can  be  reached  from  y  along 

a  path  not  going  through  x  whose 
final  edge  is  a  back  edge} 

00  .M  V 

where  intof   (y)  =  z  if  J  k  >_  0  such  that  intof  (y)  =  z  and 
intof (z)  is  undefined. 

(4)  If  the  root  node  r  belongs  to  reachunder (x) ,  then  x  belongs 
to  a  multiple  entry  loop.   Mark  x  as  'improper'  and  return  to 
step  (2)  to  process  the  next  back-edge  target. 
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(5)  If  r  ^  reachunder (x) ,  then  x  is  the  head  of  the  single-entry 
loop  I (x)  =  reachunder (x)   u{x} .   If  reachunder (x)  contains  no 
improper  nodes,  then  I (x)  is  the  maximal  strongly-connected 
interval  (in  the  classical  sense)  with  head  x.   Otherwise  I (x) 

is  a  single-entry  irreducible  flow  subgraph. 

(6)  Having  computed  I (x)  in  (5)  we  reduce  it  to  a  single  new 
node,  a  representative  of  which  is  inserted  at  a  place  'just 
before'  x  in  G  and  T.   We  set  intof (y)  :=  I (x)  for  all  y  €  I (x) . 
New  edges  resulting  from  this  graph  modification  are  classified 
into  three  categories: real  edges,  of  the  form  (u,  I (x) ) ,  which 
replace  edges  (u,x)  6  g,  where   u  ^  I(x);  an  additional  real 
edge  (I(x),x);  and  virtual  edges  of  the  form  (I (x) , v) ,  where 

3  u  e  I (x)  such  that  (u,v)  e  g   and  v  ^  I (x) .   After  its 
formation  I (x)  is  added  to  a  set  'intervals'  and  is  placed  in 
a  set  'proper-ints '  iff  it  is  a  proper  (reducible)  interval. 
Once  this  is  done,  we  return  to  step  (2)  to  process  the  next 
back-edge  target. 

(7)  When  the  iteration  at  step  (2)  terminates,  all  remaining 
nodes,  i.e.  nodes  x  for  which  intof (x)  is  still  undefined,  are 
nodes  not  contained  within  any  single-entry  loop.   Let  G' 
denote  the  graph  resulting  from  all  the  reductions  that  have 
been  carried  out.   If  any  node  in  G '  is  marked  'improper'  then 
G'is  an  irreducible  flow  graph;  if  not  then   G' is  a  DAG  (directed 
acyclic  graph) .   In  either  case  we  set  intof (y)  =  r  for  all  y  e  G' 

(i.e.  regard  G'as  an  interval,  logically  identified  with  its 
head,  the  entry  node  r) .   Here  no  modified  or  additional  edges 
need  be  created.  G'  (r)  is  added  to  'intervals'  (and  will  be 
refered  to  as  the  outermost  interval) .   If  acyclic,  G' is  also 
placed  in  'proper-ints'. 

Finally,  we  walk  the  tree  T  again,  in  reverse  postorder, 
to  construct  a  map  ' int-nodes ' ,  which  sends  each  interval  I  (proper 
or  improper)  to  the  tuple  of  all  its  nodes  in  reverse  post- 
order,  which  constitutes  an  interval  order  among  nodes  of 
proper  intervals.   The  output  of  our  algorithm  thus  consists 
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of  the  following  objects: 

intof  -  maps  each  node  to  the  interval  containing  it, 
intervals  -   the  sequence  of  all  intervals  in  reverse  preorder. 

(Note  that  this  order  is  inner-to-outer; 
i.e.  in  this  order  each  interval  precedes  all  intervals 

containing  it.) 
int-nodes  -  maps  each  interval  to  the  sequence  of  all  its  nodes 

in  reverse  postorder, 
proper-ints  -  the  set  of  all  proper  (reducible)  intervals, 
vedqes  -  the  set  of  all  additional  virtual  edges,  representing  flow 

in  some  derived  graph  of  the  given  flow  graph. 

Remarks :  (1)     Tarjan's  original  algorithm  makes  no  distinction 
between  intervals  and  their  heads.   We  have  chosen  to  make  this 
distinction  for  two  main  reasons:   (a)  If  x  is  an  interval  head, 
and  there  exists  an  edge  (x,u)  e  G  such  that   u  ^  I (x) ,  we  wish 
to  distinguish  between  the  flow  from  x  to  u  effected  just  by 
that  edge,  and  the  flow  from  x  through  I (x) ,  to  u.   It  is  con- 
venient to  do  so  by  introducing  I (x)  as  a  separate  flow-graph 
node,  which  we  shall  sometimes  call  the  preheader  of  x.   (b)  In 
applying  code  motion,  the  entry  to  an  interval  I (x)  becomes  a 
program  point  logically  different  from  the  entry  to  its  head 
x,  since  code  will  be  moved  out  of  the  interval  loop  and  in- 
serted between  these  two  points,  which  makes  the  above  dis- 
tinction essential.   For  these  reasons  we  represent  intervals 
as    additional  flow-graph  nodes. 

Our  treatment  of  irreducible  flow  graphs  has  the  following 
useful  features:   (a)  Irreducibilities   are  'localized'-   That 
is,  if  they  are  contained  within  some  single-entry  loop  I,  then 
their  effects  need  be  considered  only  within  I .   This  still 
allows  us  to  move  code  out  of  I.   (The  same  idea  of  localizing 
'bad'   flow  also  plays  a  role  in  Rosen's  data  flow  analysis  technique 
[Ro^].)  (b)   In  any  subsequent  data-flow  analysis  step,  single 
entry  loops  I  containing  irreducibilities  must  be  analyzed 
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using  iterative  techniques.   The  reverse  postorder  of  nodes 
serves  this  purpose  quite  well.    Indeed,  according  to  Hecht 
and  Ullman  [HU],  no  more  than  d  +  1  iterations  through  the  nodes 
of  I  are  needed  for  iterative  calculation  of  the  relevant  flow 
maps  to  stabilize,  where  d  is  the  loop-connectedness  parameter 
of  I  (cf.  [HU],  for  more  details;  similar  arginrents  are  used 
in  section  4  below  to  estimate  the  efficiency  of  our  interprocedural 
analysis  techniques) . 

Next  we  describe  the  application  of  the  data-structures 
computed  by  the  above  interval  analysis  in  solving  data-flow 
problems  of  the  bit-vectoring  class.   In  this  section  we  consider 
only  intraprocedural  forward  data-flow  analysis.   The  correspond- 
ing interprocedural  and/or  'backward'  analyses  are  studied  in 
detail  in  the  following  sections.         Much  of  the  following 
description  is  standard  (cf .  for  example  [AC]) ;  the  novel  features 
of  our  algorithm  mainly  reflect  our  somewhat  nonstandard  re- 
presentation of  intervals. 

We  assume  that  we  are  given  an  intraprocedural  flow  graph 
G  ,  modified  by  the  above  interval  analysis,  plus  the  various 
output  objects  computed  by  that  analysis.   In  addition,  we  assiime 
that  a  preliminary  pass  through  the  procedure  p  (or  program) 
being  analyzed  has  computed  a  data-flow  map  f  („  j^)  ^  F  as  de- 
scribed in  section  2   for  each  non-virtual  edge  (m, n)  G  G  . 

Let  X  e  L  denote  the  information  to  be  assumed  at  the  procedure 
o 

entry  point. 

Data-flow  analysis  then  consists  of  the  following  phases. 
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(1)  Elimination  Phase 

In  this  phase  we  iterate  through  the  intervals  of  the 
procedure  in  their  reverse  preorder  (i.e.  from  inner  to  outer). 
In  processing  an  interval  I  we  compute  two  kinds  of  new  data-flow 
maps:   (a)   For  each  outgoing  virtual  edge  (I,v)  we  compute  a 
map  f ,^   ,  representing  the  data-flow  from  the  entry  to  I  (i.e. 
the  entry  to  its  preheader) ,  through  I,  to  the  entry  to  the 
successor  node  v.   (b)   For  each  node  u  S  i  we  compute  an 
auxiliary  data-flow  map,  denoted  as  f  ,  which  represents  the 
data-flow  from  the  entry  to  the  head  x  of  I, along  paths  contained 
in. I,  to  the  entry  to  u. 

To  compute  these  maps,  we  iterate  through  the  nodes  of  I 
in  their  interval  order  (i.e.  their  relative  reverse  postorder, 
or  the  order  in  which  they  appear  in  int_nodes (I) ) .   For  each 
node  u  visited  in  this  manner,   we  apply  the  following  formula; 


(3.1) 


^n  =  ^  ^^(.r   .,^°    f,,  =  (^'^^  ^  G^  ^^^   w  6  I},  if  U  7^  X, 
u        ^w,  u;    w  p 

^^   =  lA  ^    ^^^^(w,x)°  ^w  '•     ^^'^^  ^  ^p   ^^^   ^  ^  ^^^  • 


Note  that  the  condition  w  e  i  is  required  to  make  sure  that  the 
edge  (w, u)  (or  (w,x))  is  an  internal  edge  of  I.   For  example,  w 
may  be  an  inner  interval,  so  that  the  edge  (w,u)  is  a  virtual  edge, 
resulting  from  a  real  edge  (w',u),  where  w'  is  a  node  within  w. 
In  this  case  u  has  two  predecessors  w,  w' ,  and  the  test  w  €  I 
selects  only  the  first  one,  as  desired. 

If  I  is  a  proper  interval,  then  it  is  well  known  that  two 
iterations  through  the  nodes  of  I  are  sufficient  for  the  auxiliary 
maps   {f^  :  u  e  i}  to  stabilize,  and  one  iteration  is  sufficient 
for  the  outermost  interval  (since  it  is  acyclic  in  this  case)  . 
If  I  is  improper,  then  we  iterate  till  information  stabilizes 
and     test  explicitly  for  convergence.   However,  by  [HU],  the 
number  of  iterations  required  to  reach  convergence  is  bounded 
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by   d   +  1,  where   d   is  the  loop-connectedness  parameter  of  I. 
Moreover   d   is  obviously  bounded  by  the  number  of  targets  of 
back  edges  in  I,  which, by  steps  (4) -(6)  of  our  interval  analysis 
algorithm,  is  equal, if  I  is  an  inner  interval,  to  1  +  the  number 
of  heads  of  multiple-entry  loops  within  I   but  not  within  a 
subinterval  of  I  .   If  we  denote  this  number  by  M,  then  at  most 
M  +  2  iterations  through  the  nodes  of  I  are  required  for  the 
stabilization  of  the  above  maps.   Heuristically,  this  implies 
that,  in  the  worst  case,  not  more  than  one  additional  iteration 
is  required  for  each  'source'  of  irreducibility  within  I.   The 
same   argument  obviously  also  applies  to  the  outermost  interval. 
After  computing  the  auxiliary  maps   f   in  the  above  manner, 
we  compute  the  data-flow  map   f ,    ^    defined  above  for  each 
virtual  successor  v  of  I, using  the  formula 

(3.2)       f  (^^^^  =  A  {f  (^^^)0  V  ^(I,x)  ••  -  ^  ^  ^"<^  ("'"^  ^  ^P^ 

Note  that  we  could  also  define  and  compute  the  auxiliary  maps 
f   in  a  way  which  includes  the  flow  through  the  preheader  of  I, 
which  would  simplify  the  above  formula  slightly.   Our  main  reason 
for  not  doing  so  is  that  we  may  wish  to  perform  code  motion  into 
the  preheader  of  I,  and  that  such  motion  does  not  affect  the   maps 
f   as  we  have  defined  them,   but  would  change  them  if  they  also 
reflected  flow  through  the  preheader.   We  refer  the  reader  to 
section  7,  in  which  these  issues  are  discussed  in  greater  detail. 
However,  if  code  motion  is  not  integrated  with  the  analysis 
which  we  are  now  describing,  then  we  can  as  well  include  the 
flow  through  the  preheader  in  the  computation  of   f^,  which 
allows  us  to  replace  id  by   f,^  .       in  Equations  (3.1),  and 
to  rewrite  Equations  (3.2)  as 


(3.2-)      f(^^^^  =  A  (f(^^^)»   f,  •■  -  ^  I  and  (u,v)  e  g^} 
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(2)  Propagation  Phase  •  . 

In  this  phase  we  iterate  through  the  intervals  of  the 
procedure  in  their  preorder  (from  outer-to-inner) ,  propagating 
data  from  interval  entries  to  interval  nodes,  using  the  auxiliary 
maps  f   computed  in  the  previous  phase. 

We  begin  by  initializing  the  solution  map  x  of  our  data-flow 

problem;  this  is  done  by  putting  x    =  x  .   (Here  r   is  the 

rp     o  p 

procedure  entry,  and  also  represents  the  'outermost  interval'  of 

the  procedure.)   Whenever  we  come  to  process  an  interval  I,  we  will 

already  have  computed  attribute  data  x   at  its  entry.   If  h 

denotes  the  head  of  I, then   x^  =  f ,    ,  (x  )  represents  attribute 

data  known  at  entry  to  the  loop  of  I.   Hence,  for  each  u  G  I, 

x^   =   f^  (x  )  is  the  data  attribute  state  at  the  entry  to  u. 

Proceeding  in  this  manner  we  compute  the  value  of  x  at  entry  to 

each  basic  block,  which  completes  the  intraprocedural  solution 

of  the  data-flow  problem.   (If  the  flow  through  the  preheader 

of  I  is  already  recorded  in  the  auxiliary  maps   f  ,  then  there 

y^  /v      U 

is  no  need  to  compute   x^,  and  we  have   x   =  f   (x^)  for  each 

I  u    u    I 

node  u  e  I . ) 

It  is  useful  to  estimate  the  time  complexity  of  the  above 
algorithm,  which  turns  out  to  be  rather  favorable. 

We  define  the  following  quantities: 


N,  ,   ,   =  number  of  basic  blocks  in  the  flow  graph 

N.       =  number  of  intervals 

^irred   ~  number  of  multiple-entry  loops 

:  .  ^   ,  =  number  of  virtual  edges 
virtual 

E^^     =  number  of  internal  edges  in  all  intervals  (i.e.  edges 


whose  source  and  target  belong  to  the  same  interval) 
E   .    =  niomber  of  edges  going  out  of  an  interval. 

Also,  for  each  interval  I,  we  put 
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N_    =   number  of  nodes  in  I 

M_    =   number  of  multiple-entry  loops  in  I 

E     =   number  of  internal  edges  in  I . 

We  can  then  compute  the  time  required  by  the  above  algorithm 
as  follows:   Suppose  that  each  bit-vector  operation  (and,  or) 
takes  one  unit  of  time.   Then  it  is  easily  seen  (cf.  section  2 
for  details)  that  functional  application  takes  2  time  units; 
functional  meet  takes  2  time  units,  and  functional  composition 
takes  4  time  units.   If  we  assume  that  the  analysis  to  be  per- 
formed involves  no  code  motion,  then  the  application  of  the 
modified  Equations (3 . 1)  will  require  at  most 

^  (6  Ej  -  2  Nj  +  2)  (Mj  +  2  -  5^^) 
I 
time  units,  where  the  summation  is  over  all  intervals  I  (note 
that  the  number  of  elementary  meet  operations  required  to  take 
a  meet  over  a  set  S  is   |s|  -  1).   Similarly,   Equations  (3.2') 
will  require 

6  E   .   -  2   E  .  .   , 
out        virtual 

time  units  and  the  modified  propagation  phase  will  require 

y  2  N^  =  2  (N,,   ,    +   N.  .    -  1)  time  units. 
^-^         I       blocks       ints 

I 

For  example,  if  the  procedure  flow  graph  is  reducible,  then  the 
total  number  of  time  units  required  by  the  algorithm  is  at  most 

''  ^in  -  ^(N^locks  +  ^ints  "  ^^^'    ^ints  "  ^  E^  +  2  N^  -  2 

^   ^   ^out  -  '  ^virtual  ^  ^  i%^^^y,s   +  ^ints  "  ^^ 
=  12  E.^  +  6  E^^^  -  2  E^.^^^^,  -  6  E^  -  2  N^^^^^^^  +  2  N.^^^+  2  N^ 
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4 .  Interprocedural  Forward  Data-flow  Analysis 

In  this  section  we  describe  our  technique  for  interprocedural 
forward  data-flow  analysis.   We  will  also  analyze  its  performance 
and  efficiency  in  terms  of  the  call  graph  structure  of  the 
program  being  analyzed,  and  will  discuss  various  possible  im- 
provements and  extensions  of  our  approach. 

We  remind  the  reader  that  our  model  of  a  procedural  program 
is  one  in  which  procedures  are  parameterless  (or  more  generally, 
in  which  parameters  are  called  by  value) .   Thus  the  aim  of  our 
interprocedural  analysis  is  to  determine  the  properties  of  global 
variables.   The  model  we  use  evades  certain  difficult  problems 
such  as  analysis  of  'aliasing' that  would  arise  in  the  presence 
of  reference  parameters  (cf.  [ROp]),  but  in  the  semantic  en- 
vironment  which    we  assume   our  algorithm  yields  sharp  in- 
terprocedural data-flow  information. 

The  algorithm  to  be  presented  below  is  based  on  a  prior 
study  of  interprocedural  analysis  by  Sharir  &  Pnueli  [SP],  and 
is  in  fact  merely  a  simple  and  efficient  implementation  of  the 
'functional  approach'  described  in  [SP]  (see  also  section  2). 
However,  we  justify  this  algorithm  by  some  new  theoretical 
results  concerning  interprocedural  flow,  which  are  detailed  below. 

Our  interprocedural  algorithm  is  quite  similar  to  the 
intraprocedural  data-flow  algorithm  given  in  the  previous  section. 
The  description  given  below  will  emphasize  the  modifications 
and  extensions  of  the  previous  algorithm  needed  for  interpro- 
cedural analysis. 

In  general  terms,  the  interprocedural  algorithm  consists 
of  the  following  phases: 

(a)    Initialization.    In  this  phase  we  compute  data-propagation 
maps  f,         .  for  all  non  virtual  edges  (m, n)  G  G  such  that  m 

\  ill  /  II  / 

is  not  a  call  block.   If  m  is  a  call  block  then  the  effect 

of  flow  through  m  is  not  known  a   priori,    and  the  determination 
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of  that  effect  is  in  fact  one  of  the  major  goals  of  our  analysis; 

for  such  m ,  f ,    \  is  initially  left  undefined  (or  rather  is 
'   (m,n)  ^ 

set  to   f „  to  indicate  undefined  flow) . 

(b)  Call-graph  Analysis.    This  phase  is  independent  of  the 
particular  interprocedural  bit-vector  data-flow  problem  in  question, 
and  so  should  be  performed  once  prior  to  any  solution  of  such 

a  problem.   In  this  phase  we  analyze  the  structure  of  the  program's 

call  graph   CG,  as  follows:   We  first  construct  a  depth-first 

spanning  tree  T  for  CG ,    then  find  all  strongly-connected  components 

S-|  ,  S„,  .  .  . ,  S,  of  CG  arranged  in  reverse  postorder  (with  respect 

to  T)  of  their  roots.   We  also  arrange  the  nodes  within  each  S. 

in  their  reverse  postorder.   For  this  purpose,  we  use  an  efficient 

new  algorithm,  differing  from  known  algorithms  for  constructing 

strongly-connected  components  of  a  directed  graph  (cf.  [AHU, 

section  5.7])  so  as  to  list  components  in  both  external  and 

internal  reverse  postorder.   This  algorithm  is  presented  in 

Appendix  A.   In  addition,  for  each  strongly-connected  component 

S.,  we  estimate  the  loop-connectedness  parameter  d.5  d(S.,T).  Actually, 

since     efficient  computation  of  d.  may  not  be  possible  in 

general,   we  make  do  by    overestimating  it;    our  estimate 

for  d.  is  simply  the  number  of  back-edge  target  nodes  (with  respect 

to  T)  in  S.,  which  will  be  denoted  as  d!.   Obviously  d;^  >  d. 
i'  1  -'1—1 

and  d'  =  d.  if  S.  consists  of  a  single  node.   Note  that  a  typical 
111 

call  graph  can  be  expected  either  to  be  acyclic  (for  non-recursive 
programs) ,  or  at  worst  to  contain  several  simple-loops,  each 
corresponding  to  some  simply-recursive  procedure.   Thus,  for 
such  call  graphs  each  component  S.  will  indeed  consist  only 
of  a  single  node  p,  and  accordingly  d.  =  dT  =  0  if  p  is  non- 
recursive,  d.  =  dT  =  1   if  p  is  recursive. 
11         ^ 

(c)  Elimination  Phase:   In  this  phase  we  iterate  through  the 
program  procedures  in  the  following  order:   First  we  iterate 
once  through  the  strongly  connected  components  of  CG  in  their 
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postorder  (i.e.  in  the  order  S,  ,  S, _-,..., S^ ) ,  and  for  each  such 

component  S .  we  iterate  through  its  procedures  in  their  relative 

postorder  at  most  2  dT  +  1  times.   Each  procedure  p  visited  in 

this  iteration  is  subjected  to  an  interval-baced  elimination 

pass,  quite  similar  to  the  elimination  phase  of  the  intrapro- 

cedural  algorithm  given  in  the  previous  section,  but  preceded 

by  the  following  resetting  of  flow  maps  for  call  blocks :   For 

each  call  block  c  and  its  (unique)  successor  v,  we  compute  the 

associated  data-flow  map   f ,    »  (which  is  not  yet  available 

^    (c,v)  ^ 

from  the  initialization  phase  (a) )  using  the  formula 

(4.1)       f,    .   =   t()    ,  where  q  is  the  procedure  called 
(c,v)       e  ^        ^ 

q 

by  c.   Note  that  if  ^  is  still  undefined  (e.g.  if  q  has 

^q 

not  yet  been  processed) ,  then  it  is  taken  to  be  t^. 

While  computing  these  maps,  we  also  do  the  following  two 
things  to  speed  up  the  algorithm: 

(i)    If  p  is  being  reprocessed  (as  will  happen  if  p  is  recursive) , 

and  for  each  call  block  c  and  its  successor  "   \ .:    p.   :- ,    k  has 

(c,v; 

not  changed  from  its  value  in  the  last  processing  of  p,  then 
obviously  there  is  no  need  to  process  p  again,  since  the  analysis 
results  will  not  have  changed.   If  processing  of  all  procedures 
in  S.  has  stabilized  in  this  manner,  then  the  processing  of 
S.  has  converged,  and  we  go  on  to  analyze  the  next  component   S.  ^. 

(ii)   Even  if  full  convergence  as  in  (i)  has  not  yet  been 
achieved  in  p,  we  can  still  bypass  a  considerable  part  of  the 
reprocessing  of  p  by  noting  that  for  a  given  interval  I  in  p, 
it  is  pointless  to  re-analyze  I  if   no  call-block  flow-maps 
^(c  v)  '  ■^°^  ^  ^  node  in  I  or  in  a  subinterval  of  I,  have 
changed  since  the  last  processing  of  p.   Skipping  the  repro- 
cessing of  such  intervals  will  speed  up  repeated  processings 
of  a  procedure  substantially. 
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Later  in  this  section  we  will  show  that  the  elimination 

phase  computes  the  correct  values  of  the  auxiliary  maps   f   and 

the  extended-flow  maps   f ,^   ^  .   (Note   also     that  in  the 

^     (I,v) 

interprocedural  case  these  maps   will  account  for  flow  along 
interprocedurally  valid  balanced  paths  only;  compare  this  to 
the  v/ay  in  which  the  maps  cf)   are  defined  in  section  2.) 
Remarks :  (1)  We   do  not  compute  the  maps  (j)   defined 

in  section  2  explicitly;  however,  they  can  be  easily  constructed 
from  the  maps  actually  computed,  as  will  be  demonstrated  below. 
Nevertheless,  the  maps  cj)   which  are  needed  in  the  call-block 
maps  resetting  subphase,  "^  are  actually  available,  since  (J)    =  f   , 
because  e   always  lies  in  the  outermost  interval  of  the 

q 

procedure  q. 

(2)   If  the  program  to  be  analyzed  is  nonrecursive,  then  all 
the  procedures  are  processed  just  once  in  their  postorder. 
It  is  easy  to  check  that  this  order  constitutes  an  'inverse 
invocation  order'  in  the  terminology  of  [Alp]. 

(d)  Calculation  of  Data  at  Procedure  Entries.   This  phase 
establishes  data-values  at  procedure  entries   by  solving  equations 
(2.3).     We  convert  these  equations  to  data-flow  equations  for 
the  call  graph   CG,  by  defining  the  map 

(4.2)  q,         .  =  Aid)   :  c  is  a  call  from  p  to  q} 

^(p,q)       c 


for  each  edge  (p,q)   G  CG,  so  that  equations  (2.3)  reduce  to 
the  fo: 
to  p)  : 


the  following  equations  (where  z   denotes  x   ,  the  data  at  entry 

p  r 


(4.3) 


z  ^  X 

main     o 

z    =  A{g-    ,  (z  )  :  (q,p)  e  cg},  for  each  p  7^   main 

P       (q/P)   q 
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However,  before  attempting  to  solve  these  equations,  we 
first  have  to  compute  the  maps   (f)    appearing  in  (4.2),  which 
are  not  immediately  available  from  phase  (c) .   For  this  we  use 
the  formula 


(4.4)  4,   =  f     f        .  .  .  o  f^k,  , 

c    c     1(c)  I  (c) 

k+ 1 
where  1(c)  -    intof(c),  the  interval  containing  c,  and  where  I    (c) 

is  the  outermost  interval  of  the  procedure  p  containing  c. 

(Of  course,  (4.4)  is  applicable  to  any  block  c.)   To  justify  (4 . 4 )  ,  we 

observe  that  any  (interprocedurally  valid  and  balanced)  path  from 

r   to  c  can  be  decomposed  as  the  concatenation  of  paths,  the 

first  of  which  leads  from  r   to  (the  entry  of)  I  (c) ,  through 

k+1  P 

nodes  of  I    (c)  =  r  ,  the  second  of  which  leads  from  the  entry 

to  I  (c)  through  nodes  of  I  (c)  to  the  entry  to  I    (c)  and  so 
on.   (Of  course,  this  justification  still  depends  on  the  (still 
unproven)  assertion  that  final  value  of  the  auxiliary  maps  f 
computed  in  phase  (c)  correctly  represent  the  effect  of  the 
corresponding  flows;  this  assertion  will  be  proven  later  in 
this  section.) 

Since  in  most  cases  the  program's  call  graph  will  be 
quite  simple  (and  in  any  case  its  size  can  be  expected  to  be 
much  smaller  than  the  sizes  of  the  flow  graphs  analyzed  in  the 
previous  phase),  solution  of  equations  (4.3)  will  generally  be 
easy.   To  get  this  solution,  we  use  the  following  iterative 
approach:   We   iterate  once  over  the  strongly-connected  components 
S^  of  the  call  graph  in  their  relative  reverse  postorder,  and 
for  each  such  S^  we  iterate  over  its  procedures  in  their  relative 
reverse  postorder,  applying  (4.3)  till  data  stabilizes  at  their 
entries,  but  no  more  than  d"  +  1  times.   That  this  number  of 
iterations  is  sufficient  can  be  shown  using  arguments  similar 
to  those  in  [HU] .    In  fact,  this  iteration  technique  is  applicable 
to  the  solution  of  similar  data-flow  equations  for  any  flow  graph, 
and  is  a   trivial,  but  significant  improvement  of  the  Hecht-Ullman 
iteration  method  [HU] .   It  is  also  a  special  case  of  Kennedy's 
node-listing  technique  [Ke] . 
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(e)  Data-propagation .    This  is  the  simple  final  phase  of  our 
algorithm.   Here,  using  the  entry  information  provided  by  phase 

(d)  and  the  auxiliary  maps  of  phase  (c) ,  we  propagate  data  to 

each  node  in  the  program  flow  graph.   This  is  done  precisely  as 

in  the  propagation  phase  of  the  intraprocedural  algorithm  given 

in  the  previous  section,  only  here  we  begin  the  propagation  by 

setting  x    =  z   for  each  procedure  p.   This  propagation  completes 

^p    P 
our  analysis . 

We  now  turn  to  analyze  the  performance  of  the  elimination 
phase  (c)  of  our  algorithm,  and  to  justify  its  correctness. 
We  will  investigate  the  behavior   of  phase  (c)  for  arbitrary 
orders  of  iteration  through  the  program's  procedures, and  will 
show  that  our  iteration  order   is  also  particularly  efficient. 

Let  us  first  introduce  some  notations.   Let  p  be  a  procedure 

and  n  a  node  in  p.   Using  the  notations  of  [SP],  we  define 

IVP  (r   n)  as  the  set  of  all  interprocedurally  valid  balanced 
o   p,  '  ^ '^ 

execution  paths  (i.e.  interprocedural   paths  in  which,  procedure 

calls  and  returns  are  executed  in  a  proper  sequence,   i.e., 

each  return  matches  the  last  uncompleted  call  and  all  calls  are 

subsequently  completed)  leading  from  r   to  n .   For  each  path 

TT  e  IVP  (r  ,n)  we  define  CS(7t)  as  the  set  of  all  calling  sequences 
op;  — 

materialized  during  the  execution  of  it,  each  such  calling 
sequence  being  the  invocation-order  sequence  of  all  procedures 
invoked  and  not  yet  completed,  as  some  initial  subpath  tt'  of  fr 
is  executed.    Note  the  obvious  one-one  correspondence  between 
calling  sequences  and  paths  (without  their  initial  node)  in 
the  call  graph  CG. 

Suppose  now  that  our  elimination  phase  (c)  iterates 
through  the  program  procedures  in  some  arbitrary  order  p^,  p„,...,Pj^, 
(where  repetition  is  allowed) ,  and  attains  convergence. 
The  general  results  of  [SP]  ensure  convergence  provided  that 
sufficiently  many  iterations  are  applied,  because  every 
' bitvectoring '  data-flow  framework  has  a  finite  semilattice  L. 
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Let  p,  be  the  k-th  procedure  processed  in  the  assumed 

k 
order.   For  any  node  n  in  the  program,  let  (f)    denote  the  value 

of   cf)   (given  by  (4.4))  immediately  after  p,  had  been  processed. 

It  follows  by  standard  arguments  that  this  processing  of  p, 

will  yield  the  fixpoint  solution  of  equations  (2.2),  for  the 

nodes  of  p,  ,  provided  that  each  procedure  q  called  from  p, 

is  assumed  to  have  the  data-flow  effect  described  by  the 

map   ()) 

The  significance  of  calling  sequences  is  seen  from  the 
following  observation.   As  shown  in  [SP,  section  3],  for  each 
procedure  p  and  each  node  n  in  p  the  final  value 

of  the  maps  <i>^    satisfies 

(4.5)  (P^   =   A{f^:  TT  G  IVPQ(r  ,n)  } 

Let   TT  G  IVP  (r  ,  n)  .   At  what  point  in  phase  (c)  can  we  be 

op  Jr  r- 

sure  that  f   has  already  been  included  in  the  meet  which  defines 

the  current  value  of  d)  ?  A  sufficient  condition  is  given  in 

^n  ^ 

the  following  lemma: 

Lemma  4.1:   Suppose  that  phase  (c)  has  already  processed  procedures 
p,,P2,...,p,  in  order  and  is  now  analyzing  Pv.i  =  P-   If  the 
reverse  sequence  of  each  calling  sequence  in  CS(tt)  is  a  sub- 
sequence of  p^ , p_,  . . . , p,  ,  then,  at  the  end  of  the  current  processing 

k+1 
of  p  we  have  f   >  di 

^  TT  —  ^n 

Proof:   It  follows  from  a  remark  made  above  that     when  we  have 
finished       processing    p  we  will  have 

(4.6)  (})J^^^  =  A{f*    :  TT  e  IVP^(r  ,  n)  } 

* 
where   f   is  defined  as  follows:   consider  it  as  a  path  lying 

wholly  in  p,  paying  no  attention  to  the  flow  within  invoked 

subprocedures .   Suppose  that  when  mapped  in  this  way  tt   becomes 

(  r  '  s^, S2, . . . , s . ,n)  .   We  then  put 
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(4.7) 


IT 


=   h 


(s  ,n) 


^^j-l'^j^ 


h 


(s^,S2) 


(rp,s^) 


where 
(4.7') 


(m,n) 


± ,         ,  ,  if-  m  is  not  a  call  block 
(m,  n) 


4   ,  if  m  is  a  call  to  a  procedure  q 


Oar   proof  now  proceeds  by  induction  on  k.   If  k  =  0,  then  no 

TT  satisfying  the  assumptions  of  the  lemma  can  contain  procedure 

calls.   For  such  it  we  have   f   =  f  ,  and  the  assertion  of  the 

lemma  follows  immediately  from  (4.6). 

Next  suppose  that  the  lemma  is  true  up  to  and  including 

some   k  >^  0,  and  let  it  be  a  path  satisfying  the  assumptions  of 

the  lemma.   Suppose  that  it,  restricted  to  p,  has  the  form 

(r  ,  S-,  s„,...,s.,  n)  as  above.   Then  the  actual  path  tt  is 
P    -^    ^      J 

equal  to   tt-,  |  |  fr^ ]  |  .  .  .  |  |  tt  .  -  ,  where  for  each  i  <_  j  +  1 


(s.  i,s.),  if  s.  1  IS  not  a  call  block 
1-1   1       1-1 

77.=    J 

the  subpath  of  tt  corresponding  to  the  execution 
,  of  the  procedure  called  at  s._,  ,if  s._,  is  a  call 

block. 

By  the  induction  hypothesis  we  have   h,        \    1.   ^-n    '    i^l-'-J"*"!/ 

^^i-l'^i'      i 

so  that   f_  <  f  ;  thus   (j)^"^   <  f  *   <  f    by  (4.6).   This  proves 


IT    —        IT 


n 


—        TT 


TT 


the    lemma, 


Q.E.D, 


One  immediate  corollary  of  this  lemma  is  that  if  the  call 
graph  of  the  program  is  non-recursive  (i.e.  acyclic)  then  it  is 
sufficient  to  process  each  procedure  once  if  processing  is   in  inverse 
invocation  order,  to  ensure  stabilization  of  interprocedural  attributes 
Indeed,  if  this  order  of  processing  is  used  then  at  the  time  p 
is  processed,  all  procedure!!,  which  can  be  called  from  p  either 
directly  or  indirectly  wi'^l  already  have  been  processed.   Thus 
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for  each  node  n  in  p  and  each  tt  e  ivp  (r  ^n)  ,  tt  satisties  the 

•^  o   p    ' 

conditions  of  the  preceeding  lemina.   Consequently,  when  finished 
with   processing  of  p  we  will  have  <^      <_      f    ,       so  that  by  using 
(4.5)  it  follows  easily  that  the  value  of  tf)   at  the  end  of  proces- 
sing  is   already  equal  to  the  desired  value  of  this  map. 
This  observation  is  essentially  due  to  Allen  [Alp]. 

Suppose  next  that  the  call  graph  is  recursive  (cyclic) . 
In  this  case  paths  in  the  call  graph  (and  hence  also  calling 
sequences)  can  be  arbitrarily  long,  so  that  it  is  infeasible 
to  apply  lemma  4.1  to  all  paths  tt  in  order  to  determine  when 
stabilization  must  occur. 

However,  in  this  case  we  can  r^ake  use  of  the  common  device 
of  looking  for  a  relatively  small  subset  of  the  set  of  all 
relevant  paths  tt  which  has  the  property  that  it  is  sufficient 
to  trace  the  flow  through  these  paths  to  obtain  the  desired 
data-flow  quantities.   Kennedy's  node  listing  algorithm  [Ke], 
Kam  and  Ullman's  study  of  the  Hecht-Ullman  algorithm  [KU]  and 
Sharir  and  Pnueli ' s  study  of  another  interprocedural  data-flow 
technique  [SP,  section  5]   all  employ  this  technique. 

Using  the  special  properties  of  bit-vector  data-flow 
frameworks,  we  will  now  show  the  existence  of  an  appropriate 
•exhaustive'  subset  of  paths.   Each  bit-vector  data-flow  frame- 
work (L,F)  is   1-related  in  the  terminology  of  [SP, section  5] 

that  is,  (suppressing  the  extension  of  L  by  ^)    L  can  be  decomposed 

E 
as   {0,1}    for  some  (finite)  set  E,  and  each  f  s  F  admits  a 

decomposition  (f  )  ^^  such  that  for  each  x  e  L,  (f(x))^  =  f  (x  ) , 

Ot^  E  Ot  DC 

and  each  f   ig  either  constant  (0  or  1)  or  the  identity  id 

on  {0,1}.   This  implies  that  for  each  execution  path  tt  =  (n^ ,  n^ .  •  . ,  n,  ) 

and  each   a  e  e  either   f^       >  =  id  for  all  i  <  k,  or  else 

there  exists  an  index  s  <  k  such  that   f ,        \  is  constant 

(n  ,  n  ^) 

and  f^^   ^    ,  =  id  for  all  i  >  s.        ^   ^     In  the  first 

(^i'"i+l)    — 

case  f^  =  id  and  in  the  second   f°'  =  f'^        ^      Thus  f^ 
TT    —  ^     ('^s'''^s+l^-  "" 

depends  on  at  most  one  point  along  tt  . 
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It  is  helpful  to  attach  the  following  heuristic  inter- 
pretation to  the  above  observation:   Consider  some  specific 
component   a  s  e  of  our  bit  vector.   Then  each  flow  effect  can 
be  considered  to  be   either  an  'event'  (corresponding  to   f   =  1),  or 
an  'anti-event'  (f^'  =  0);  in  the  absence  of  any  event  or  anti-event,we  have 
'transparency'   {f°'  =  id).    For  example,  in  available  expressions 
analysis,  each   a  G  E  is  an  expression  whose  availability  is  to 
be  analyzed.   An  'event'  is  a  generation  of  a,  an  'anti-event' 
is  a  kill  of  a,  and  a  transparent  flow  is  one  in  which  a  is 
neither  generated  nor  killed.   The  flow  effect  of  a  path  it  thus 
depends  on  the  last  non-transparent  edge  effect  in  tt  (if  any)  ; 
if  this  is  an  event,  then  tt  is  said  to  create  an  event,  and 
if  this  is  an  anti-event,  tt  is  said  to  create  an  anti-event; 
otherwise  tt  is  transparent.   In  forward-intersection  data  flow 
problems,  we  wish  to  determine  for  each  node  n  whether  all  paths 
leading  to  n  create  an  event,  whereas  in  forward-union  problems 
(where  lattice-meet  is  set  union  rather  than  intersection) , 
we  wish  to  determine  whether  there  exists  at  least  one  event- 
creating  path  leading  to  n. 

Lemma  4.2:  Let  (L,F)  be  a  bit  vector  data-flow  framework  as 

above.   Then  for  each   a  e  E,  procedure  p,  node  n  in  p  and  path 

TT  e  ivp  (r  ,n)  there  exists  another  path   tt'  g  IVP  (r  ,  n)  (which 
op  op 

passes  only  through  nodes  of  tt)  such  that 

(i)    <,  =  f? 

(ii)   Every  calling  sequence  in  CS(7t'')  is  the  concatenation 
of  two  sequences,  neither  of  which  contains  any  procedure  more 
than  once . 

Proof;      We  can  assume  that  tt  does  not  satisfy  (ii)  for  other- 
wise we  can  simply  take   tt  '  =  tt  .   Let  us  first  describe  our  proof 
heuristically .   Suppose  that  tt   is  event-creating,  and  that  this 
event  occurs  in  some  node  m  (possibly  in  another  procedure) . 
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Then,  instead  of  tracing  the  flow  of  tt  we  can  go  (through  nodes 
of  TT  )  in  the  shortest  possible  way  to  m  and  then  in  the  shortest 
possible  way  back  to  n.   However,  the  fact  that  we  insist  on 
interprocedural  validity  may  add  extra  constraints  on  the  above 
path-shortening  process.   (Note  that  in  intraprocedural  analysis 
the  absence  of  such  constraints  simplifies  the  situation  con- 
siderably.  For  example,  the  initial  part  of  the  new  path  need 
not  even  trace  nodes  of  tt,  but  can  be  any  shortest  acyclic  path  from  r 
to  m.  Moreover,  interprocedurally  the  terminal  part  of  the  new 
path  must  consist  of  nodes  of  tt,  to  ensure  transparency  along 
the  path  after  the  event  has  occurred,  but  in  the  intraprocedural 
case  can  be  chosen  to  be  acyclic.) 

To  make  this  idea  more  precise,  let  tt  =  (rp=  n]^  ,n2  ,n3 ,  .  .  .  ,n]^  ,n 

and  let  s  <  k  be  the  largest  index  such  that  f?   ^        )    ^^   constant 

^"s'"s+l^ 
(the  last  event  or  anti-event  along  tt)  .   If  there  is  no  such 

index,  s  is  undefined.    Suppose  that  some  calling  sequence 

corresponding  to  some  initial  subpath  tt.  of  tt  contains  a  procedure 

q  twice.   Then  we  cari  shorten  tt  as  follows.   tt-  contains  two 

different  calls  c^ ,  c«  to  q  where  neither  call  is  completed  in 

TT   (though  both  are  completed  in  tt  later  on)  .   Let  tt'  be  the 

path  obtained  if  we  execute  tt  up  to  c^,  then  execute  q  in  the 

same  way  that  tt  does  from  the  second   time  q  is  invoked  till  the 

corresponding  return,  but  then  return  to  the  block  following 

c^    instead  of  returning  to  the  block  following  c^:    and  finally 

follow   TT   from  there  as  if  the  first  invocation  of  q  has  been 

completed.   Obviously  tt'  G  IVP  (r  ,n)  and  tt'  is  shorter  than  tt  . 

If   TT   transparent,  then  obviously  tt'  is  also  transparent  and 

we  can  keep  shortening  tt'  in  this  manner  till  (ii)  is  also 

satisfied,  and,  in  fact,  till  all  calling  sequences  in  CS(tt') 

contain  each  procedure  at  most  once. 

If  TT  is  non-transparent,  then  we  can  carry  out  the  above 

shortening  process  as  long  as  n   is  not  deleted  (that  is,  n^ 

is  neither  in  the  subpath  of  tt  from  c,  to  c^,    nor  in  the  subpath 


P 
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between  the  corresponding  returns  from  q) .   Suppose  that  we  have 

shortened  tt  in  this  manner  as  much  as  possible  to  obtain  tt^ 

ex      ct 
which  still  contains  n  .   Obviously   f  ,  =  f   , and  we  claim 

that  tt'  satisfies  (ii)  .   Indeed,  if  not,  then  there  exists  a 

calling  sequence  materializing  during  the  execution  of  tt'  which 

is  not  the  concatenation  of  two  sequences  none  of  which  contains 

a  procedure  more  than  once.   Let  T  be  that  calling  sequence, 

and  let  T,  be  the  largest  initial  subsequence  of  T  in  which 

no  procedure  appears  more  than  once.   Then  we  can  write 

T  =  T^  II  [pi  II  T   where  p  appears  also  in  T- .   Now  either  p 

appears  also  in  T„,  or  else  some  other  procedure  q  appears 

twice  in  T„.   In  the  first  case  we  have  three  calls   c^,  C2,  c^ 

to  p  along  tt',  none  of  which  has  been  completed  when  the  next 

one  is  made,  and  in  the  second  case  we  have  four  calls  c^, C2, c^, c. 

along  tt',  where  c,,c„  are  calls  to  p,  c^,  c.  are  calls  to  q, 

and  where  none  of  these  calls  has  been  completed  when  the  next 

one  is  made.   Then  it  is  clear  that  we  can  apply  the  above 

shortening  process  to  tt',  using  in  the  first  case  either  the 

calls  c^ ,  Cy   or  the  calls  c^,    c^,    and  in  the  second  case  either 

the  calls  c^,  c^  or  the  calls  c^,  c^,  to  obtain  a  shorter  path 

which  still  contains  n  .   This  contradicts  the  definition  of  tt' 

s 

and  it  follows  therefore  that  tt'  satisfies  (ii)  .  Q.E.D. 

Remark:     The  above  argument  is  very  similar  to  that  used  in 
Lemma  5.3  and  theorem  5.5  of  [SP]. 

Corollory  4.3;   Let  (L,F)  be  a  bitvector  data-flow  framework. 
Then,  for  each  procedure  p  and  each  node  n  in  p,  we  have 


(4.8)  *   =  A{f   :  TT  e  ivp  (r  ,n)  such  that  each 

^n       TT  op' 

calling  sequence  in  CS(Tr)  is  the  concatenation 
of  two  sequence  none  of  which  contains  a 
procedure  more  than  once} 
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Proof:      By  lemma  4.2,  for  each   a  e  £  we  have 

ACf"  :  IT  e  IVP  (r  ,n)}  =  ACf"  :  tt  as  in  the  right-hand  side 

°   P  ^  of  (4.9)} 

and  the  lemma  follows  immediately  from  (4.5).       Q.E.D. 

Lemma  4.1  and  corollory  4.3  can  now  be  combined  to  deduce 
the  following  'node  listing'  principle: 

Definition :  A  doubled  node  listing  for  the  call  graph  of  a 
program  is  a  sequence  S  of  procedures  having  the  property  that 
each  path  in  the  call  graph  which  is  a  concatenation  of  two 
acyclic  paths  (excluding  its  initial  node)  is  a  subsequence  of  S. 

Theorem  4.9:  Let  S  be  any  doubled  node  listing  for  the  call 
graph,  and  suppose  that  phase  (c)  of  our  basic  algorithm  is 
performed  by  processing  procedures  in  the  reverse  of  their  order 
in  S.    Then,  at  the  end  of  one  iteration  through  S,  the  algorithm 
will  converge  and  all  data-flow  maps  will  have  their  final 
desired  value. 

Proof:  Immediate  from  lemma  4.1  corollory  4.3  and  the  correspondence 
between  call  graph  paths  and  calling  sequences.    Q.E.D. 

Note  however,  that  phase  (c)  does  not  compute  the  maps  cj) 
directly,  but  instead  computes  auxiliary  maps  f  for  each  flow 
graph  node  n.   It  is  a  simple  matter  to  extend  lemma  4.1  and 
corollary  4.3  to  handle  such  maps  as  well,  and  we  leave  this  as 
an  exercise  to  the  reader.   As  a  by-product  of  such  argximents, 
one  can  also  prove  (4.4)  rigorously. 

In  view  of  the  preceding  theorem,  analysis  of  the  elimination 
phase  of  our  algorithm  reduces  to  showing  that  the  iteration 
order  actually  used  in  this  phase  does  constitute  a  doubled 
node  listing  for  the  program's  call  graph.   The  order  in  which 
procedures  are  processed  during  elimination  is  the  reverse 
order  of  the  tuple 

(4.9)  S  =  Z  (2dr  +  1)*S. 

i=l    ^ 
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of  procedures  where  the  procedures  within  each  strongly  connected 
component  S .  of  the  call  graph  are  arranged  in  their  reverse  tree 
postorder,  and  where  summation  corresponds  to  tuple  concatenation 
while  multiplication  by  an  integer  represents  tuple  replication. 
To  show  that  this  S  is  indeed  a  doubled  node-listing  for  the 
call  graph,  we  proceed   as  follows: 

Lemma  4.5:   Let  it  be  a  path  in  CG,   Then  tt  can  be  decomposed 

as  TT.  II  TT .  ...  II  TT .    where  each  tt  .   consists  only  of  nodes 

12m  J 

of  S.   and  where   l<ii  <  i.^...<    i   <  k. 
1  .  —  1     2       m  — 

J 

Proof :   Suppose  the  contrary.   Let  S.  ,  S.  ...S.   be  the  sequence 

^1    ^2     ^m 
of  strongly  connected  components  of  CG  through  which  tt  passes 

in  this  order;  for  each  j  _<  m  let  tt  .   be  the  part  of  tt   that 

lies  within  S.  .   Then  for  at  least  -'  one  j  <  m  we  have  i.  >  i  .  .  ,  . 
1.  J     J+l 

By  the  strong  connectivity  of  the  components  S .  it  follows  that 
there  exists  a  path  tt_   from  the  root  p  of  S .   to  the  root  q  of 

S.    .   Since  i.  >  i.,,,  p  has  a  lower  postorder  index  than  q 
in-"  trie  depth-first  spanning  tree  T  which  establishes  the  order 
of  the  strong  components  S . .   Thus  either  q  is  an  ancestor  of 
p  in  this  tree  or  else  q  is  to  the  right  of  p.   In  the  first 
case  the  existence  of  the  path  tt   implies  that  p  and  q  must 
belong  to  the  same  strongly  connected  component.   The  second 
case  is  impossible  since  the  edge  connecting  tt  .   to  tt.     would 

be  a  left-to  right  cross  edge.   This  contradiction  proves  our  lemma < 

Q.E.D. 

Theorem  4.6;    The  S  of  (4.9)  above  is  a  doubled  node  listing  for 
the  call  graph. 

Proof:   Let  tt  be  a  path  in  the  call  graph  which  (ignoring  its 
initial  node)  is  a  concatenation  of  two  acyclic  paths  tt^  and  tt"  . 
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Decompose  tt  as  in  the  previous  lemma.   That  lemma  implies  that 
it  is  sufficient  to  prove  that  each   tt  .   is  a  subsequence  of 

(2  dT   +1)  *  S.  .    Hence  we  can  assume  with  no  loss  of 

generality  that   tt  is  a  path  in  S.  for  some  i  £  k.   Since  both 

tt'  and  tt"  are  acyclic,  it  follows  from  the  definition  of  d. 

that  tt  contains  no  more  than  2  d.  back  edges.   The  nodes  of  S. 

1        ^  1 

are  ordered  in  S .  in  such  a  way  that  the  only  edges  from  a 

node  to  a  previously  listed  node  are  back  edges  with  respect 

to  T.   Hence  any  part  of  tt  between  two  subsequent   back  edges 

is  a  subsequence  of  S.       and   it   follows  that  tt  is  a  subsequence 

of  (2  d^  +  1)  *  S^,  and    therefore  also  of  (2  d .'  +  1)*S^.   Q.E.D. 

Remark:  This  result     is  analogous     to  Hecht  and  Ullman's 
bound  [HU]  on  the  number  of  iterations  required  for  their 
(intraprocedural)  data-flow  algorithm  to  converge,  and  also  with 
the  bound  on  iterations  through  interval  nodes  given  in  section 
3,  and  a  similar  bound  for  backward  problems  to  be  given  in  the 
next  section.   The  difference  between  Hecht  and  Ullman's  bound 
(d  +  1)  and  our  bound  (2  d  +  1)  reflects  additional  interprocedural 
constraints  on  execution  paths  that  do  not  allow  us  to  shorten 
paths  as  much  as  is  possible  in  the  intraprocedural  case. 

We  shall  now  show  that  the  bound  2  d  +  1  cannot  be  improved 
in  general .   Consider  a  strongly  connected  component  S  consisting 
of  a  single  recursive  procedure  p.   Here  d  =  1  and  the  above 
results  indicate  that  p  may  have  to  be  processed  3  times.   Indeed, 
the  following  example  shows  that  processing  p  only  twice  may 
fail  to  produce  the  desired  maps  values. 
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Example   1 : 


procedure  p 


=  a*b 


;,  :  call  p   y 


c^: 


\ 


;=  1 


Cj^-.    call  p 


e  :  return 
P 


Consider  analysis  of  the  availability  of  the  single  expression 


b, 


To  detect  that 


(|>  ,  =  0  (i.e.  that  a  *  b  may  be  killed 


.+ 


during  the  corresponding  flow)  ,  we  need  to  trace  flow  along  the 
path 


IT  =  (r_,  c^,  r  ,  c^,     r^. 


P'  ^P' 


^2'  ^p'  '^l' 


P     -L     P- 

and  this  can  be  done  only  by  processing  p  three  times.   At  the 


end  of  a  first  iteration  we  will  have  cj) 


id,  and   'I'  +  ~  ^o 
^1 


as  only  the  middle  path   can  be  traced  during  that  iteration. 

During  the  second  iteration  we  will  have    ({>  +  =  i/  since  the 

1 
effect  of  the  call  c,  is  taken  as  ())    as      obtained  from  the 

^  ^P 

last  iteration,  i.e.  id;  but  at  the  ^end  of  this  iteration  we 

will  arrive  at  the  correct  value  cf)  =  0^  by  propagating  through 
the  right-hand  path  containing  c^.  ^  Using  this  value  during  the 
third  iteration  over  p  will  then  yield  the  correct  value  for  ^   ^.. 

^1 
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Example  2:   Consider  the  call  graph 


mam 


V 


±L.  R 


which  has  the  following 


depth-first  spanning  tree 


main 


\   /   ■''  ' 


Here  there  are  two  strongly  connected  components 


S-^  =  [main] 


d   =  0 


so  that 


S^    =    [P,  Q,  R]     ^2  "  ^ 


S  =  [main,  P,Q, R, P, Q, R, P,Q,R,  P,Q,R,P,Q,R1 


Suppose  that  the  procedures  in  question  have  the  following  internal 
form. 


Q. 

r 


Q 


c, :  call  Q     C2 :  call  R    c^:  call  P    c . :  call  R     c^ 


V^ 


R 


•R 


i 

:  '^call 

i 


Q 


'R 
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where  x  denotes  a  kill  of  a*b,  and  v;here  no  other  node  in  these 

routines  affects  a*b.   Then  to  deduce  that  A    =  0   we  have  to 

^R   ~ 
trace  the  following  path 

=    (  +  +  + 

■n         {r^,c^,rQ,c^,rp,x,    c^,  r^,  c^^.r^,  c^,  r^,  e^,  c^,  e^,  c^,  e^,  c^,  e^, 

+  +  s 

^3'^Q'^5'^R^ 

during  which  the  calling  sequence  (Q,  P,  R,  Q,  P)  materializes. 

It  is  easy  to  see  that  if  we  process  procedures  in  the  reverse 

order  of  S,  then  we  will  arrive  at  the  correct  value  for  i 

^R 

only  during  our  fifth  iteration  over  R.    Indeed,  the  following 

table  shows  the  flow  effect  through  each  procedure  computed  as 

this  procedure  is  processed: 


Z: 

procedure 
processed 

R  0  P 

R  Q 

P 

R   Q 

P 

R   Q 

P 

R   Q 

P 

\ 

'^.  'u  '"^ 

f„  " 

id 

id  id 

0 

id  0 

0 

0.  0 

0 

(Of  course,  in  this  case  we  do  not  claim  that  the  above  doubled 

node  listing  S  is  optimal;  all  we  claim  is  that  if   our  standard 

ordering  of  strongly-connected  components  is  used,  then  5 

repetitions  of  [P,QjR]  is  necessary  in  this  case. 

An  issue  yet  to  be  considered  is  the  adaption  of  our 

technique  to  analysis  of  programs  containing  procedures  with 

parameters.   Assume  first  that  parameters  are  passed  by  value 

(as  is  the  case  in  SETL) .   Then  only  two  modifications  of  our 

method  are  required.   Till  now  we  have  assumedthat  the  flow 

effect  of  a  call  block  c  (in  regard  to  the  elimination  phase  (c) ) 

is  identical  with  the  flow-effect  of  the  invoked  procedure  q, 

and  accordingly  have  set  f  ,    \  =  4)   .   To  allow  for  the  presence 
^  ^  (c,v)    ^e 
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of  parameters  we  must  replace   (}>    in  this  assignment  by  another 

map  which  reflects  both  the  binding  of  actual  arguments  to  formal 
parameters  and  the  stacking  and  unstacking  of  parameters  in 
recursive  cases.   This  new  map  may  not  belong  to  F,  so  that  the 

performance    analysis  which  we  have  given  for  the  elimination 
phase  (c)  may  no  longer  be  valid,  and  extra  iterations  may  be 
required  to  ensure  convergence.   Moreover,  we  have  ass\amed  that 
during  the  entry-data  phase  (d)  data  does  not  change  between 
a  point  of  call  to  a  procedure  q  and  its  entry.   This  need  not 
be  true  if  q  has  parameters,  and  thus  equations  (4.2)  and  (4.3) 
have  to  be  modified  by  replacing  each  map  4)   by  another  map 
which  also  takes  the  pre-call  action  at  c  into  account  (e.g. 
the  transmission  of  input  arguments  to  input  parameters) . 
These  maps  may  again  not  be  of  the  bitvectoring  type,  which 
means  that  here  too  the  iteration  bounds  that  we  have  given 
may  no  longer  be  correct  and  extra  iterations  may  be  required. 
Of  course,  one  can  compromise  by  assuming  (as  we  did  in 
section  2)  that  value-passage  between  actual  arguments  and  formal 
parameters  at  a  procedure  call  c  is  made  explicit  in  the  code 
by  assignment-like  statements  before  and  after  the  call,  which 
are  then  analyzed  as  regular  assignments  independent  of  c  itself. 
This  approach,  while  safe,  can  lead  to  some  loss  of  accuracy, 
as  it  ignores  some  details  of  parameter  passage.   Nevertheless, 
this  approach  allows  the  algorithm  we  have  described  to  be 
used  without  modification  and  the  same  efficient  iteration 
bounds  still  apply. 

The  situation  becomes  considerably  more  complicated  when 
parameters  passed  by  reference  are  allowed.   To  retain  accuracy 
in  this  case,  our  algorithm  must  be  substantially  modified, 
to  take  the  possibility  of  'aliasing'  (cf.  [Ro],  [Ba]  into 
account) .   We  have  ignored  this  problem  as  it  does  not  arise 

in  optimizing  SETL.   However,  it  seems  likely  that  a  variant 
of  our  method,  preceded   by  some  kind  of  aliasing  analysis 

(such  as  that  described  in  [Ba])can  be  applied  in  this  case. 
However,  it  will  probably  be  hard  to  avoid  some  loss  of  accuracy. 

-38- 


5 .  Intraprocedural  Backward  Data-flow  Analysis 

In  this  section  we  describe  an  approach  to  intraprocedural 
backward  data-flow  problems  which  we  extend  to  the  interprocedural 
case  in  the  next  section. 

In  backward  data-flow  analysis  information  is  propagated 
in  the  reverse  direction  of      control  flow,  from  program  exits 
backward.    Such  an  analysis  aims  to  determine  at  each  program 
point  n  what  might  (or  must)  occur  after  control  has  reached  n. 
Backward  analysis  is  used  for  various  purposes,  e.g.  to  compute 
the  live-dead  status  of  program  variables  ([He],[AU]),  and  also 
in  conjuction  with  a  forward  analysis  in  order  to  determine 
safety  and  profitability  of  certain  optimizations  (see  section 
9  for  a  list  of  the  analyses  of  this  kind  used  in  the  SETL 
optimizer) . 

One  can  always  view  backward  data-flow  analysis  as  a 
forward  analysis  applied  to  inverse  flow-graph.   But  if  this 
device  is  used,  the  relevant  program  structures,  such  as  the 
interval  structure,  may  become  unfavorable.   Indeed,  the  inverse 
flow-graph  is  in  general  not  reducible,  and  even  basic  blocks 
may  have  multiple  entries  in  that  graph.   Also,  reachability 
in  the  reverse  graph  is  not  guaranteed  a   priori.      All  this  implies 
that  the  most  appropriate  treatment  of  backward  analysis  will 
tend  to  differ  considerably  from  that  of  forward  analysis  as 
described  in  sections  3  and  4. 

In  approaching  the  design  of  a  'backward'  analyzer,  a 
first  problem  encountered  is:   where  should  data-values  be 
computed?   An  off-hand  answer  might  be:  at  node  exits.   Each 
such  exit  is  associated  with  an  arc  (m, n)  £  G, where  n  is  a 
successor  of  m.   (Note  that  m  may  contain  more  than  one  exit 
to  n,  but  in  our  model  all  these  exits  are  identified,  since 
we  do  not  allow  multiple  edges  between  graph  nodes.)   In  such 
an  approach,  we  would  want  to  compute  a  data-value  ^fm  n)  ^   '^ 
for  each  (m,n)  e  G,  representing  information  known  upon  exit(s) 
from  m  to  n.   Given  an  edge  (n,k)  e  o,  let  f (^  k)  ^   ^   denote 
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the  data-flow  map  representing  the  effect  of  reverse  flow  through 
n  from  its  exit(s)  to  k  back  to  its  entry.  Then  we  can  write  a  stand- 
ard set  of  (intraprocedural)  data-flow  equations  for  the  values  z ,    , 

^  ^  (m,n) 

as  follows  (where  x   denotes  the  null  data-state  that  we  assume 
of  program  exits): 

z  ,    ,  =  X.  ,  if  n  is  a  program 

^^'""^  °  exit 

(5.1)  ^^^^ 

z  ,         ,=A{f,   ^^(Z/   i,\)=  (n,k)eG},   otherwise. 
(m,n)        (n,k)   (n,k) 

It  is  easily  seen  that  the  maximal  fixpoint  solution  of  (5.1) 

has  the    property  that  for  each  (m,n)  e  G,  z,    x  is  independent 
•^      ^        -•  ( m ,  n ) 

of  m  and  depends  only  on  n  (since  what  is  known  upon  exit  from  m 

to  n  is  precisely  what  is  known  at  entry  to  n) .  Therefore 

equations  (5.1)  are  equivalent  to  the  following  equations,  where 

for  each  graph  node  n,  x   denotes  data  known  at  entry    to  n: 


(5.2) 


X   =  X   ,  if  n  is  a  program  exit, 

X   =  A  {f ,   ,  ,  (x  )  :   (n,k)  e  G},   otherwise 
n        (n,k)   k 


In  what  follows,  we  will  solve  equations  (5.2)  rather  than  (5.1), 
Note  that  by  computing  data  per  node  rather  than  per  edge  we 
also  save  a  considerable  amount  of  space. 

We  will  now  describe  an  interval-based  technique  for  the 
solution  of  equations  (5.2)   for  analyses  of  the  bitvectoring 
class.   This  technique  is  similar  in  overall  design  to  the 
technique  for  forward  problems  described  in  Section  3,  although 
significant  differences  between  the  two  do  exist. 

As  in  the  forward  case,  we  assume  that  a  preliminary  pass 

through  the  program  code   has  computed,  for  each  nonvirtual  edge 

(m,n)  e  G  a  data-flow  map   f ,    ,  G  F  as  defined  above.  To 

(m,n) 

analyze  a  single  procedure  p   intraprocedurally ,  our  technique 
proceeds  through  the  following  phases: 
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(1)   Elimination  phase 

In  this  phase  we  iterate  through  the  intervals  of  p  in  their 
reverse  preorder  (innermost  to  outermost) .   For  each  such  interval 
I  we  compute  two  kinds  of  data-flow  maps:   (i)  For  each 
successor  v  of  I  we  compute  a  map  f  ,    ,   describing  the  effect 
of  flow  through  I   to  the  entry  of  v.   (ii)  For  each  node  u  G  I 
and  each  successor  v  of  I  we  compute  an  auxiliary   map    f ,    . 
describing  the  effect  of  flow  from  the  start  of  u,  through  I, 
to  entry  to  v. 

Since  equations  (5.2)  propagate  information  from  successors 
of  a  given  node  back  to  that  node,  it  follows  that  in  computing 
the  above  maps   functional  composition  should  be  taken  in 
reverse-flow  direction.   More  precisely,  the  following  formulae 
should  be  applied  in  backward  analysis: 

for  auxiliary  maps;   For  each  successor  v  of  I,  first  set 

f ,    ,  ==  id,  and  then  use  the  formulae 
(v,v)    — 

(5.3)     f  ,    ,  =  A  if ,    s     °    f  ,         ^  :  (u,w)  G  G  and  (wSl  or  w=v)  \ 
(u,v)      I  (u,w)     (w,v)  'j 

for  each  u  s  I. 

for  extended- flow  maps; 

^^•^^  ^(I,v)  "  ^(I,h)  °  ^(h,v) 

for  each  successor  v  of  I  (where  h  is  the  head  of  I) . 

Equations   (5.3)  can  be  solved  by  iterating  through 

the  nodes  of  I  in  reverse    interval  order  (i.e.  in  postorder) . 

If  I  is  a  proper  interval,  then  three    iterations  through  its 

nodes  are  required   to  guarantee  convergence  of  the  solution. 

The  reason  why  an  extra  iteration    (as  compared  with  the 

forward  case)  may  be  required  is  that,  in  order  to  record 

an  event  (or  anti-event)  in  f  ,    ,  for  some  u  s  i  and  v  a 

(u,v) 

successor  of  I,  flow  has  to  be  traced  backward  from  v  to  the 
node  w   at  which  this  event  occurs  along  an  acyclic  path, 
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and  then  from  w  to  u  along  another  acyclic  path.   It  is  easily 
seen  that,  for  any  interval  I,  this  tracing  may  require  up  to 
2d  +  1   reverse  iterations  through  the  nodes  of  I,  where  d  is 
the  loop-connectedness  parameter  of  I,  so  that  for  proper 
intervals  (for  which  d  =  1)  three  iterations  might  be  needed. 
(As  an  example  consider  the  following  interval  I: 


In  this  example,  to  record  an  event  at  node  w  in  f  ,    , 
^  (u,v) 

one  has  to  trace  at  least  the  path  (u,h,w,h,x, v)  which  loops 
back  to  the  head  h  twice  and  therefore  three  iterations  are 
really  required. ) 

In  much  the  same  way  as  in  the  analysis  of  the  forward  case, 
the  above  argument  implies    that  for  each  source  of  irreduci- 
bility   imbedded  within  an  improper  interval  I,  two  extra 
iterations  through  the  nodes  of  I  may  be  required.   Even  though 
this  doubles  the  number  of  extra  iterations  required  to  handle 
imbedded       irreducibilities  as  compared  with  the  forward 
case,  the  degradation  of  algorithm  performance  will  still 
be  very  mild. 

As  to  the  outermost  interval  I  of  p,  one  reverse  iteration 
through  the  nodes  of  I  is  sufficient  if  I  is  proper,  and  two 
additional  iterations  are  required  per  each  imbedded  source  of 
irreducibility .   This  interval  is  treated  in  a  somewhat 
different  manner  than  the   other  intervals,  as  it  does  not 
have  any  successors,  so  that  auxiliary  maps  cannot  be  defined 
for  its  nodes  in  the  same  way  as  for  nodes  of  inner  intervals. 


-42- 


To  adjust  for  this  slight  difference,  we  regard  the  exit  block  e 
and  the  stop  block  s   as  'successors'  of  I,  which  requires  us 

P  /s  ^ 

to  compute  the  auxiliary  maps  f ,     ,  ,  f ,     ,  for  each  node 
u  e  I.  P        P 

(2 )   Second  elimination  phase. 

This  phase  appears  only  in  the  solution  of  backward  problems 
and  does  not  correspond  to  any  phase  of  the  forward  algorithms. 
Our  aim  in  this  phase  is  to  compute  a  second  kind  of  auxiliary 
map   defined  as  follows:   For  each  node  u  in  the  procedure  p 
being  analyzed,  we  compute  a  map  fe   ,  describing  the  effect 
of  reverse  flow  from  the  exit(s)  of  p  back  to  (the  start  of)  u. 
These  maps  are  essentially  the  maps  ij),     described  in  Section  2 
for  interprocedural  purposes,  and  are  analogous  to  the  maps  cf) 
that  we  compute  inplicitly  in  the  elimination  phase  of  the 
forward  analysis,  but  here  we  prefer  to  compute  them  explicitly, 
since  no  simple  formula  analogous  to  (4.4)  exists  for  direct 
construction  of  these  maps.   (Note,  however,  that  it  is  only 
in  the  interprocedural  case  that  such  a  computation  has  to  be 
carried  out  separately  from  data  propagation.   However  we 
separate  these  two  phases  even  in  the  intraprocedural  case  to 
make  our  intraprocedural  and  interprocedural  approaches  agree. 
See   the  next  section  for  more  details.) 

Computation  of  the  maps   fe   proceeds  as  follows:  We  iterate 
through  the  intervals  of  p  in  their  preorder  (from  outermost  to 
innermost).   Consider  first  the  outermost  interval  I.  For  each 
u  s  I  we  can  compute 

(5.5)  fe   =  f ,     N  A   (f ,     N  (x^)  ) 

cr  ir 

i.e.   take  the  meet  of  the  map  f ,     >  with  the  constant  map 

f  ,     ,  (x„).   (We  treat  return  and   '^stop  blocks  differently 
(u,s  )   0  '^ 

mainl^  for  interprocedural  reasons,  since  interprocedurally 

the  data-state  known  at  e   is  usually  different  from  the  null 

P 
value  x-  ,  which  is  assumed  at  program  exits.      Note  that  e   is 

not  a  program  exit,  whereas  s   is.) 
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Next   suppose  that  our  iteration  has  come  to  process  an 
inner  interval  I .   Then  for  each  u  g  I   we  compute 

(5.6)  fe   =  A  {f ,    ,  o  fe  :  V  a  successor  of  1} 

Note  that  any  interval  containing  a  successor  v  of  I  (i.e. 
intof(v))   must  be  an  ancestor  interval  of  I,  i.e.   intof (v) 
=  intof  (I)   for  some   k  >_  1 ,   so  that  fe  will  already  have 
been  computed  for  all  successors  v  of  I  when  we  apply  (5.6) 
to  nodes  of  I. 

(A  small  technical  problem  arises  in  connection  with 
endless    loops.      Suppose  that  L  is  such  a  loop,  i.e.  a  strongly- 
connected  program  region  with  no  successors.  While  such  loops 
do  not  create  any  problems  in  forward  analysis,  since  they  are 
reachable   from  the  program  (or  procedure)   entry,  they  are 
somewhat  problematical  in  backward  analysis,  since  such  loops 
cannot  reach  any  program  exit,  and  so  no  information  can  be 
propagated  to  them.   Noting  that   such   loops   can  only  appear 
in  the  outermost  interval,  either  as  a  single  node  (interval) 
if  it  is  single  entry,  or  as  an  (irreducible)  collection  of  nodes 
otherwise,  we  can  solve  this  technical  problem  by  adding 
an  edge   (x,s  )  for  each  node  x  in  the  outermost  interval  which 

ir 

does  not  reach  e   or  s  .    Assuming  this  to  have  been  done, 

P     P 
the  fe   maps  will  then  be  properly  defined  at  each  node  of 

the  procedure.  ) 

(3)   Propagation  phase : 

This  final  phase  of  backward  analysis  is  relatively  trivial 

in  view  of  the  preparation  for  it  accomplished  during  the  second 

elimination  phase.   Let  x  e  l  be  the  data-state  known  at  return 

P 
from  a  procedure  p.   (In  the  intraprocedural  case   x  will  be 

the  standard  null  data  state  x   ;  in  interprocedural  analysis, 

x  will  differ  from  one  procedure  to  another  in  the  manner 

described  in  the  next  section.)   For  each  node  u  in  p,  we 

compute  X   ,  the  data-state  at  the  start  of  u,  using  the  simple 

formula 

(5.7)  X   =  fe  (x  ) 

u      up 

which  completes  our  intraprocedural  backward  analysis. 
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6.    Interprocedural  backward  data-flow  analysis 

In  this  section  we  extend  the  intraprocedural  algorithm 
described  in  the  previous  section  to  the  interprocedural  case. 
The  required  modifications  are  rather  similar  to  those  used 
for  interprocedural  forward  problems  in  Section  4,  and  we 
will  mainly  point  out  the  differences  between  these  two 
techniques.   More  details  are  provided  in  Sections  2  and  4. 

Our  algorithm   consists  of  the  following  phases: 

(a)  Call  graph  analysis:   here  we  perform  exactly  the  same 
analysis  as  is   described  in  Section  4.   (As  already  noted, 
call  graph  analysis  should  be  carried  out  once  at  the  begin- 
ning of  optimization,  prior  to  any  other  interprocedural 
data-flow  analysis.) 

(b)  Initialization :   This  phase  associates  data-flow  maps 

f ,    N  of  the  analysis  with  each   nonvirtual  edge  (m,n)  s  G 
(m,n)  ^ 

such  that  m  is  not  a  call  block. 

(c)  Interprocedural  elimination  phase:   In  this  phase  we 
iterate  through  the  procedures  of  the  program  in  the  same  order 
as  is  described  in  Section  4.   Each  procedure  p  visited  in  this 
iteration  is  subjected  to  an  elimination  pass  quite  similar  to 
that  described  in  Section  5,  which  involves  adjustment  of 
data-flow  maps  for  call  blocks  and  tests  for  convergence  as 

in  Section  4.   Note,  however,  that  in  backward  analysis  the 
data-flow  map  iJj    describing  the  effect  of  flow  through  a 
procedure  q  is   "  computed  as 

(^•1)  ^   =  ^r  ,e  )  ^  (^r  ,s  )(^o)^ 

q     q  q       "5  q 

where  x.  is  the  assiimed  null  data-state  at  program  exits  (compare 
with  (5.5)). 

The  theoretical  bounds  on     algorithm  performance  derived 
in  Section  4  apply   in  this  case  also.   The  necessary  theorems 
and  proofs  are  analogous  to  those  given  in  Section  4,  for  which 
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reason  we  omit  them. 

(d)  Second  intraprocedural  elimination  phase:   This  is  performed 
for  all  routines  in  the  program  being  analyzed,  and  within  each 
routine  uses  exactly  the  technique  described  in  Section  5. 

(e)  Computation  of  data  at  procedure  exits:   This  interpro- 

cedural   phase  is  very  similar  to  the  phase  which  calculates 

entry-data  in  forward  analysis  (cf.  Section  4).  For  each 

procedure  p,  let  z   =  x   denote  the  data  state  at  exit  (return) 

P    ^p 
from  p.   If  p  is  the  main  program,  then  (since  we  assume  that 

the  main  program   is  nonrecursive)   execution  terminates  when 

the  exit  point  of  p  is  reached.   Thus   we  can  write  the  following 

set  of  equations,  whose  heuristic  meaning  should  be 

self-explanatory : 


z   .   =  X- 

main     0 


(6.2) 


z   =  A  {fe  (z  ) :  V  follows  immediately  a  call  to  p  in  q} 
p         V   q 


As  in  Section  4,  it  is  convenient  to  transform  (6.2)  into  a  data 
flow  problem  for  the  call-graph,  by  defining,  for  each 
(p,q)  e  CG 

(6.3)  a,         N  =  A  (fe  :  v  follows  immediately  a  call  to  q  in  p} . 
'(p,q)    "     V  -^ 

This  transforms  (6.2)  into 


z   .   =  X- 

main     0 


(6.4) 


"p  =   '^     ^5(g,p)(2q):  (q,p)  ^  CG} 


Note  that  equations  (6.4)  define  a  forward   data-flow  problem 
rather  than  a  backward  problem  for  CG.    This  can  be  explained 
as  follows:   In  the  forward  case,  information  is  propagated 
from  the  entry  of  a  calling  procedure  to    the  entry  of  a  called 
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procedure.   In  the  backward  case,  information  is  propagated 
backward  from  the  exit  of  a  calling  procedure  to  the  exit  of 
a  called  procedure;  but  both  propagations  induce  a  forward 
propagation  in  the  call  graph. 

The  maximal  fixpoint   solution  of  (6.4)  is  found  precisely 
as  in  Section  4,  using  the  same  iteration  order  over  procedures, 
so  that  the  performance  bounds  described  in  Section  4  remain 
va  1  i  d . 

(f)   Final  propagation  phase:   This  phase  is  very  similar  to 
the  final  intraprocedural  propagation  phase  described  in  Sec- 
tion 5.   For  each  procedure  p,  let  z   be  the  data  state  at  its 
return  point,  as  computed  in  phase  (e).   Then  for  each  node  u 
in  p  we  compute  the  data-state  x   at  the  start  of  u  as  follows: 

(6.5)  X   =  fe  (z  ) 

u     up 

This  completes  our  analysis. 
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7 .    Code  motion 

In  this  section  we  describe  a  technique  for  performing 
code  motion  as  part  of  a  data-flow  analysis.   The  technique 
is  surprisingly  simple  and  straightforward,   but 
it  rests  on  heuristic  assumptions  concerning  the  safety  and 
profitability  of  motion  which   are   quite  adequate  for  SETL, 
but  may  not  be  generally   applicable. 

That  code  motion  and  data-flow  analysis  are  closely 
related   can  be  seen  by  considering  the  typical  case  of 
available  expressions  analysis.   In  this  analysis  a  'T-event'  is 
a  computation  of  an  expression  T,  and   a  corresponding 
'anti-event'   is  a  re-definition  of  a  variable  on  which  T 
depends.   In  tnis  analysis  we  determine  for  each  program  point 
n  the  expressions  T  which  have  the  property  that  every  path 
leading  from  program  entry  (or  procedure  entry)  to  n  contains 
an  event  for  T  which  is  not  followed  by  an  anti-event.  If  T 
is  such  an  expression,  then  a  computation  of 
T  at  n  is      redundant      and  can  be  eliminated.  In  more 
general  cases   this  recomputation  may  not  be  unconditionally 
redundant,  but  may  become  so  if  we  insert  another  earlier  event 
for  T  at  a  point  having  lower  execution  frequency.  When  this  is 
the  case  it  may  be  profitable  to  insert  a  preceding  event  (i.e. 
computation) ,  thereby  eliminating  the  need  to  compute  T  at  n 
and  reducing  the  total  n\amber  of  computations  of  T.  This  code 
transformation  is  known  as  code   motion   even  though,  strictly 
speaking,  nothing  is  really  moved.   (See  [AU] ,  [Sc] ,  [MR] ,  [MFS] 
for  various  code  motion  algorithms.) 

Abstracting  from  this  example,  we  say  that  a  (forward) 
data-flow  analysis  of  the  bitvectoring  class  is  amenable    to    code 
motion      if  it  has  the  property  that  whenever  an  event  occurs 
at  a  program  point  n  and  any  path  from  the  program  entry  to 
n  contains  a  similar  event  not  followed  by  a  corresponding 
anti-event,  then  the  event  at  n  is  redundant  and  the  actual 
operation (s)  which  realize  this  event  can  be  eliminated  without 
changing  the  overall  program  behavior.   Among  the  data-flow 
analyses   amenable  to  code  motion  we  may  mention: 
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(a)  Copy  optimization,  in  which  an  event  occurs  whenever  we 
create  an  unshared  value  for  an  object  (either  by  copying  the 
object  or  by  computing  a  new  value  for  it)  and  where  any 
operation  causing  the  value  of  the  object  to  be  shared  is 
viewed  as  the  corresponding  anti-event.  (Note,  however,  that 
in  this  case,  only  value  copying  events  can  be  redundant.) 

(b)  Conversion  optimization.   This  analysis  appears  (e.g.  in 
the  SETL  optimizer)  when  program  objects  can  be  given  more 
than  one  data-representation,   making  it  necessary  to 
determine  where  conversions  (or  checks)  between  different 
representations  must  be  inserted.   In  this  analysis  an  event 
(for  a  specific  variable  V  and  a  specific  representation  R) 
corresponds  to  the  conversion  of  V  to  the  representation  R  or 
to  the  assignment  of  a  value  having  that  representation  to  V; 
and  each  conversion  of  V  to  any  representation  other  than  R 
as  well  as  each  assignment  to  V  of  a  value  having  representa- 
tion different  from  R  counts  as  an  anti-event  for  V  and  R 
(see  Section  9  for  more  details). 

Any  code  motion  (or,  more  correctly,  code  insertion) 
involves  the  two  preconditions  of  profitability      and  safety. 
It  is  profitable  to  insert  code  at  some  program  point  if  this 
insertion  will  make  other  code  redundant  and  decrease  expected 
(or,  using  a  stricter  criterion,  absolute)  execution  time. 
It  is   safe  to  insert  code  at  a  program  point  if  insertion 
cannot  cause  program  abort  except  in  cases  where  the  original 
program  vrould  have  also  aborted.  We  refer  the  reader  to  [AU] , 
[Sc]  or  [MFS]  for  a  discussion  of  various  possible  criteria 
ensuring  safety  and  implying  profitability.    In  what  follows 
we  will  simplify  our  presentation  by  ignoring  the  issue  of 
safety,  and  simply  assume  that  any  code  insertion  which  we 
may  want  to  make  is  safe.   (In  SETL   this  assumption  can  be 
guaranteed  by  executing  the  optimized  program  in  a  special 
run-time  error  mode.)      However,  we   note  that  the  following 
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discussion  generalizes  easily  to  cases  in  v;hich  safety  must 
be  enforced,  though  in  such  cases  it  will  usually  be  necessary 
either  to  perform  an  additional  backward   data-flow  analysis 
to  assess  the  safety  of  code  insertion,  or  to  restrict  code 
insertion  to  cases  in  which  safety  is  assured  a  priori,  e.g. 
only  allow  code  insertion  at  entries  to  loops  (intervals) 
which  must  be  executed  at  least  once,  and  which  are  such 
that  every  path  through  such  a  loop  contains  the  calculation 
which  we  wish  to  insert  at  loop  entry. 

As  to  profitability,  we  will  attempt  to  insert  code  only 
at  interval  preheaders ,  and  will  assume  that  it  is  profitable 
to  insert  code  at  entry  to  an  interval  I   if  there  exists  at 
least  one  corresponding  event  in  I  which  this  insertion  makes 
redundant.   Of  course,  such  code  insertion  can  increase 
execution  time   rather  than  decrease  it  in  certain  unlikely 
cases,   e.g.   if  the  eliminated  calculation  is  bypassed 
when  I  is  executed.   However,  if  we  assume  that,  on  the  average, 
all  code  within  I  is  executed  at  least  once  whenever  I  is 
entered,  then  our  profitability  criterion  is  seen  to  be  quite 
reasonable. 

To  facilitate  code  motion  we  shall  compute  one  additional 

data  object,  a  map  z.  Initially,   z  maps  each  basic 

block  n  to  the  set  z   of  all  T  such  that  n  contains  a  T-event 

n 

which  becomes  redundant  if  the  same  T-event  is  available  at 
entry  to  n  but  not  otherwise.   (For  basic  blocks  these  are 
precisely  the  T  for  which  there  exists  an  upward-exposed   T-event 
in  n,  i.e.  a  T-event  not  preceded  by  any  T-anti-event . ) 
By  extending  the  map  z  to  intervals  we  will  be  able  to  determine 
the  code  which  should  be  inserted  at  each  interval  entry. 
(Note  that  since  intervals  are  identified  with  their  own 
preheaders,  the  map  values  z   ,  where  I  is  a  (preheader  of  an) 
interval,  are  also  known  initially,  but  only  reflect  flow  through 
the  preheader  itself.) 


-50- 


Consider  the  elimination  phase  of  the  forward  data-flow 
analysis  algorithm  described  in  Section  3.   Let  E  be  the 
underlying  universal  set  for  the  analysis,  i.e.   the  set  of  all 
elements  over  which  bit-vectors  are  taken.    Let  I  be  an 
innermost  interval  (i.e.  an  interval  not  containing  any 
subinterval) .   Then,  for  each  node  n  e  I  we  reason  as  follows 
(where  the  auxiliary  maps  f   reflect  only  flow  through  the 
loop  of  I  but  not  through  its  prehader;  see  Section  3) : 

(a)  f  (E)  is  the  set  of  all  elements  T  such  that  if  a  T-event 

n 

is  available  at  entry  to  the  loop  of  I   it  remains  available 
at  entry  to  n  . 

(b)  f  (0)   is  the  set  of  all  elements   T   such  that  the  T-event 
is  unconditionally  available  at  entry  to  n   even  if  it  is 
unavailable  at  entry  to  the  loop  of  I . 

(c)  Hence,   f  (E)  -  f  (0)   is  the  set  of  all  T  whose  events 

n       n 

are  available  at  entry  to  n  if  and  only  if  they  are  available 
at  entry  to  the  loop  of  I. 

(d)  Therefore,   [f  (E)  -  f  ( 0) ]  A  z    is  the  set  of  all  T  such 
that  n  contains  a  T-event  which  becomes  redundant  if  and 
only  if  such  an  event  is  made  available  at  entry  to  the 
loop  of  I. 

(e)  This  implies  that  if  we  define 

insert(I)  =  V  {[f  (E)  -  f  (0)]  A  z^ :  n  G  1} 

n       n        n 

(where  V  denotes  set  union). 

Then  insert (I)  is  the  set  of  all  T  such  that  I  contains  a 
T-event  which  becomes  redundant  if  T  is  made  available  at 
entry  to  the  loop  of  I  but  not  otherwise.  According  to  our 
profitability  criterion,  this  is  the  set  of  all  computations 
to  be  inserted  at  entry  to  the  loop  of  I  (i.e.  at  the  end 
of  the  preheader  of  I)  . 

In  order  to  be  able  to  repeat  these  arguments  for  intervals 
I  containing  and  contained  in  other  intervals,  we  must  define 
the  map  values  z   appropriately.   For  each  interval  I  we  wish  z^. 
to  be  the  set  of  all  T  sucn  that  I  contains  a  T-event  which  becomes 
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redundant  if  a  T-event  is  inserted  at  entry  to  I  but  not 
otherwise.   (These  are  very  sinilar  to  the  weakly  potentially 
redundant  expressions  discussed  in  [MFS].)   By  an  argument 
similar  to  (d)  above  it  can  be  easily  seen  that  this  is 
accomplished  by  modifying  the  value  z   attached  to  the  preheader 
of  I  as  follows: 

z^  ^  Zj  V  {[f^j  ^j  (E)  -  f^j  h)^*^^^  ^  insert(I)} 

(where  h  is  the  interval  head,  and  as  usual  I  is  identified  with 
its  own  preheader).   If  the  preheader  of  I  is  empty,  this  last 
expression  reduces  to  insert(I). 

By  carrying  this  process  iteratively  for  all  intervals  from 
innermost  to  outermost,  we  can  compute  insert  (I)  for  all 
intervals  I   (with  the  exception  of  the  outermost  interval  J,  for 
which   code  motion  is    pointless   since  J  does  not  represent 
a  loop) .  The  necessary  processing  can  be  incorporated   either  in 
the  elimination   phase  of  the  data-flow  algorithm,  or  in  a  separate 
pass   just  after  the  elimination  phase  (see  also  a  remark  below) . 
A  similar  approach  to  code  motion  has  been  devised  in  [MFS] , 

The  value  insert  (I)  defines  the  code  to  be  inserted  at 
entry  to  the  loop  of  I.   However,  some  of  this  code  may  be 
redundant  (either  unconditionally,  or  because  of  code  motion 
out  of  some  interval  containing  I)  at  its  nominal  point  of 
insertion.   With  this  in  mind,  we  delay  actual  code  insertion 
(motion)  till  the  propagation  phase   of  our  algorithm.   In 
this  phase  we  iterate  over  intervals  from  outermost  to  innermost. 
When  processing  an  interval  I  during  this  phase  we  will  already 
have  computed  the  attribute  data  x   known  at  its  entry.   This 
allows  us  to  compute 

x^  =  f  ,-r  ,  V  (x^)   (h  is  the  head  of  I) 
I     (I ,h)   I 

to  obtain  data  known  at  the  end  of  the  preheader  of  I    (i.e.  the 
program  point  at  which  code  moved  out  of  I  ought  to  be  inserted. 
We  can  then  compute 
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(a)  insert  (I)  -<-  insert  (I)  -  x-j. 

(b)  X   -t-  X   +  insert(I) 

This  gives  us  i)  a  modified  descriptor  insert  (I )   defining  the  code 
that  actually  has  to  be  inserted  into  the  preheader  of  I,  and 
ii)  modified  attribute  data  x   at  entry  to  the  loop  of  I,  which 
takes  the  newly  inserted  code  into  account.   (Note  that  we 
assume  here  that  the  only  effect  of  the  newly  created  code 
on  our  analysis  is  to  make  computations  available  and  that 
no  other  elements  (bits)  are  affected  by  it.)   After  computing 
X   we  then  use  it  to  propagate  attribute  data  to  nodes  of  I. 

This  completes  the  description  of  our  code  motion  algorithm. 
However,  several  related  issues  still  deserve  comment.   Note 
first  that  if  I  is  an  interval,  then  availability  of  the 
computations  inserted  at  the  preheader  of  I   is  exploited  only  in 
propagating  data  to  nodes  of  I,  but  not  to  update  any 
flow-maps  describing  flow  through  intervals  containing  I. 
Suppose,  for  example,  that  we  process  the  following  code: 


(1) 

(while  .  .  . ) 

(2) 

(while  . . . 

(3) 

a  *  b 

(4) 

end  while; 

(5) 

a  *  b 

(6) 

end  while; 

Then  although  we  move   the  computation  of  a  *  b  at  line  (3) 

out  of  the  inner  loop   and  thereby  make  the  computation  at 

line  (5)  redundant,  our  algorithm  will  fa,il  to  detect  this  fact. 

Our  reason  for  not  using  the  'insert'  map  in  this  more  extensive 
manner   suggested  by  this  example  is  that  the  expression 
insert(I),  is  not   monotone    in  the  functions  f.   Thus  if  I  is 
processed   repeatedly,  as  will  be  the  case  if  I  lies  on  a 
recursive  cycle  in  an  interprocedural    analysis,  then  use  of 
the  insert (I)  (or,  more  precisely,  the  z^)  information  to  update 
flow-maps  might  cause  the  algorithm  to  diverge.   However,  in 
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intraprocedural  analysis,  and  even  in  interprocedural  analysis 
of  recursion-free  programs,  in  which  each  interval  is  processed 
precisely  once  in  the  elimination  phase,  we  could  improve  the 
results  of  the  analysis  by   incorporating  the  'insert'  informa- 
tion into  the  flow  maps.   This   can  be  done  simply  as  follows: 
Let  I  be  an  interval  with  head  h.   Once  I  has  been  processed, 
we  could  put 

f.    *-     [E,  insert  (I)] 
ins 

■F         -«-  f      of 

^(I,h)    ^ins    ^(I,h) 

and  then  use  this  modified  f ,   ,  .  to  compute   the  flow  maps  for 
all  virtual  edges  going  out  of  I.   However,  this  improvement 
is  probably  only      marginal. 

A  related  problem  is  that  of  interprocedural    aode   motion. 
Here  new  and  interesting  possibilities  which  deserve  further 
study  arise.   In  particular,  the  techniques  we  have  described 
can  be  used  to  move  code  out  of  a  routine.   Consider,  for 
example,  the  following  code: 

(while  . . . )  proc  p; 

call  p  a  *  b 

end  while;  end  proc; 

Here,  if  we     insert        a  *  b  at  entry  to  the  while  loop 

(and  provided  of  course  that  a  and  b  are  globals ,  or  are  made 

into  globals)  ,  then  a  *  b  becomes  redundant  in  p.  Interprocedioral  motion 

can  be  accomplished  while  processing  the  calling  routine  (and 

after  p  has  been  processed) ,   by  assigning  z    to  z   ,  where  c 

rp     c 

is  the  call  block  containing  the  call  to  p,  and  r   is  the  entry 
to  the  procedure  p  (identified  with  the  outermost  interval  of  p) . 
This  will  make  a  *  b  'upward-exposed'  in  c,  from  which  it 
can  then  be  further  moved  to  the  preheader   of  the  while  loop. 
However,  if  p  is  also  called  from  other   points  at  which 
motion  of  a  *  b  is  not   feasible,  then  insertion  of  a  *  b  at 
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entry  to  the  while  loop  win  not  make  a  *  b  redundant  in  p. 
Two  approaches  to  this  situation  are  possible:  (i)  We  can 
make  call  blocks  targets  for  code  motion.    That  is,  if  c  is 
a  call  block,  then  we  can  require  all  computations  in  z   to  be 
inserted  at  entry  to  c,   unless  already  available  there. 
This  approach  is  capable  of  moving  code  out  of  a  procedure 
to  all  call  points  to  the  procedure,  and  even  of  moving  some 
of  this  code  further  away.   (ii)  We  can  attempt  interprocedural 
code  motion  only  in  situations  where  it  is  possible  to  test 
for  availability  of  a  computation  at  run  time  and  skip 
recomputation   if  it  is  already  available.   This  is  the  case 
for  the  SETL  oopy    optimization ,    where  dynamic  test  of 
a  1-bit  reference  count  is  possible.   In  this  case   motion 
of  a  copy  operation  out  of  p  to  the  entry  to  the  while  loop 
shown  in  the  above  example  can  eliminate  the  need  to  copy 
inside  p  when  p  is  called  from  within  that  while  loop,  even 
though  copying  within  p  may  still  be  required  if  p  is  called 
from  other  points. 

Either  of  these  approaches  must  be  used  with  caution 
for  recursive  procedures,  since  in  a  recursive  cycle  of  calls 
it  becomes  much  more  difficult  to  assess  the  profitability  of 
code  motion  between  procedures . 
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8 .  Pit-Matrix  Data  Flow  Problems 

The  'bitvectoring'  analyses  that  we  have  considered  in  the 
preceeding  pages  are  the  simplest  of  those  which  belong  to  the 
general  class  of  flow  analyses  introduced  by  Kildall  [Kl].  Their 
defining  property  is  that  they  deal  with   boolean  attributes 
that  do  not  interact;  i.e.,  each  program  statement  that  effects 
an  attribute  either  sets  it  or  drops  it.   This  special  property 
is  essential  to  the  very  efficient  interprocedural  analytic 
techniques  which  we  have  outlined.   However,  interest  exists  in 
many  other,  less  entirely  trivial  analyses, and  it  is  useful  to 
review  the  simplest  of  these  and  comment  on  the  efficiency  with 
which  they  can  be  carried  out. 

An  important  class  of  optimizations  lying  just  beyond  the  elementary 
bitvectoring  class  is  that  in  which  variable  attributes  are  still 
boolean,  but  in  which  the  effect  of  code  can  either  be  to  set, 
drop,  or  transfer  an  attribute.   As  an  example,  consider  a 
hypothetical  language  in  which  assignments  can  transfer  pointers, 
and  suppose  that  we  wish  to  determine  all  variable  occurences  at 
which  a  given  pointer  or  a  member  of  some  well-defined  class  of 
pointers  can  appear.   In  this  situation  it  is  natural  to  work 
with  the  attribute  ' can-be-pointer ' .   Some  assignments  (e.g.  of 
non-pointer  constants)  will  clearly  kill  this  attribute,  while 
others  (e.g.,  of  explicit  pointers)  will  set  the  attribute;  but 
the  interesting  new  fact  is  that  various  assignment  operators, 
including  simple  assignments 

a  :=  b 

will  transfer  the  'can-be-pointer'  attribute  from  b  to  a .  To 
take  this  into  account  the  effect  of  program  flow  must  be  described, 
not  by  a  pair  of  bitvector  coefficients  as  before,  but  by  a  linear 
boolean  mapping  f(x)  =  Ax  +  b.   Here  A  is  not  a  bitvector  of 
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length  n  (where  n  is  the  number  of  variable  occurences  entering 
into  the  analysis) ,  but  an  n  x  n  boolean  matrix.   This  makes 
map  composition  much  slower  than  the  bitvector  operations  on 
which  we  could  rely  in  the  preceding  sections.   Moreover,  an 
elimination  approach  like  that  which  we  have  described  becomes 
infeasible,  because  of  the  large  amount  of  data  that  would 
have  to  be  stored  to  keep  coefficient  matrices  A  available  at 
many  program  points.   Thus,  if  we  admit  even  this  minimal 
complication  of  the  situation  in  which  a  bitvectoring  approach 
is  possible,  analysis  immediately  becomes  much  more  difficult, 
even  though  the  form  of  the  equations  defining  the  analysis 
changes  very  little. 

It  is  also  worth  noting  that  many  more  iterations  may 
become  necessary  to  attain  convergence  in  this  case  than  are 
necessary  in  the  bitvectoring  case.   For  example,  consider 
the  following  code: 


label:     x,  :=  x„; 


^2  •"  ^3' 


n-2  ■    n-1' 

X   ,  :  =  x  ; 

n-1      n 

X   :=  pointer; 
n     ^ 

go  to  label; 

This  loop  must  be  iterated  n  times  for  the  pointer  value 
assigned  to  x   to  propagate  to  x, .   The  situation  that  confronts 
us  here  resembles  the  problem  of  forming  the  transitive  closure 
of  a  boolean  vector  under  a  general  boolean  matrix.   Of  course, 
even  in  this  case  we  can   generally  expect  to  propagate  a  single 
boolean  attribute  to  all  relevant  parts  of  a  program  in  time 
roughly  proportional  to  program  length  by  proceeding  along 
chains  of  'nearest  occurrences'  of  a  single  variable.   However, 
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each  attribute  would  have  to  be  traced  separately  since  no 
effective  algorithm  allowing  parallel  analysis  of  a  whole  group 
of  attributes  is  known  in  this  case.   It  should  be  noted  that 
these  inefficiencies  result  from  the  dependence  between  different 
attributes  that  we  have  assumed.   If  independence  of  attribute 
propagation  in  an  analysis  is  ass\amed,  even  non-boolean  attributes 
could  be  analyzed  by  elimination  techniques  like  those  described 
in  earlier  sections,  with  only  relatively  mild  degradation  of 
performance  (see  [RO, ]  for  example) . 

Overall,  we  come  to  the  pessimistic  conclusion  that  to 
carry  out  program  analysis  effectively  by  presently  known 
algorithmic  techniques  it  is  necessary  either  to  confine  one- 
self to  analyses  which  can  be  forced  into  a  bitvectoring  mold 
(or,  more  generally,  which  deal  with  independent  simple  attributes) ; 
to  analyse  for  relatively  small  numbers  of  more  interdependent 
boolean  attributes;  or  to  work  with  attributes  for  which  a 
crude  iterative  technique  converges  more  rapidly  than  worst- 
case  theoretical  arguments  would  lead  one  to  expect. 
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9 .Applications  of  the  General  Algorithms  in  The  SETL  Optimizer 

In  this  section  we  will  describe  the  specific  bit-vector 
data-flow  problems  arising     in  SETL  optimization  which  are 
solved  using  the  general-purpose  package  of  algorithms  de- 
scribed in  sections  3-7 . 


I.  Available  Expressions  Analysis.    This  well-known   analysis 

is  performed  as  follows.   With  each  well-defined  expression  e 

having  no  side  effects   we  associate  a  variable  v  which  is 

e 

used  to  store  the  value  of  e  whenever  e  is  computed.   A  re- 
dundant computation  of  e  can  then  be  characterized  by  the  property 
that  the  value  of  v  at  a  point  of  computation  of  e  is  always 
egual  to  the  result  of  computing  e,  so  that  instead  of  computing 
e  we  can  simply  fetch  and  use  the  value  of  v  .   (The  value  of 
a  non-redundant  computation  of  e  may  then  have  to  be  stored  in 
V    if  this  value  will  be  used  at  some  subsequent  redundant 
computation  of  e.   We  can  determine  whether  such  a  store  is 
necessary  either  by  a  live-dead  analysis,  or  more  simply  by 
using  a  modified  use-definition  chaining  map  (see  below).) 

Available  expressions  analysis  is  performed  as  follows. 
As  an  analysis  framework  we  use  the  lattice  L  =  2^,  where  E  is 
the   set  of  all  well-defined  expressions  having  no  side  effects. 
Meet  in  L  is  taken  to  be  set  intersection.   Each  x  e  L  denotes 
a  set  of  expressions  available  at  some  program  point  n,  i.e. 
expressions  e  having  the  property  that  along  every  execution 
path  leading  from  the  program  (or  procedure)  entry  to  n,  e 
has  been  computed  ('generated')  with  no  subsequent  modification 
of  the  variables  on  which  e  depends  (i.e.  no  'kill'  of  e) . 
The  set  F  of  data-propagation  maps  of  the  analysis  consists 
of  functions   f  :  L  ->■  L  having  the  form 

f  (x)  =  (thru   n  x)  u  gen^,  x  e  L 
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where  thru^  e  l  is  the  set  of  all  expressions  e,  which,  if 
available  at  the  start  of  the  flow  described  by  f,  are  also 
available  at  the  end  of  that  flow,  and  where  aen_  €  l  is  the 
set  of  all  expressions  which  are  unconditionally  available  at 
the  end  of  that  flow. 

We  invoke  the  algorithms  described  in  sections  3,4  and 
7  to  perform  redundancy  analysis  (which  is  a  forward  analysis) 
and  code  motion.   Note  that  the  interprocedural  part  of  the 
analysis  only  needs  to  deal  with  expressions  which  depend  on 
at  least  one  global  variable;  all  'strictly  local'  expressions 
are  analyzed  separately,  each  within  its  own  procedure. 

For  code  motion  we  apply  the  algorithm  of  section  7  as 
it  stands,  ignoring  the  issue  of  safety  altogether.   This  is 
possible  since  SETL  will  execute  programs  in  a  special  run-time 
error  mode  for  which  erroneous  computations  do  not  cause 
program  abort,  but  rather  yield  a  special  'error'  value. 
Once  generated,  error  values  will  propagate  through  other 
computations  as  long  as  they  are  not  used  in  branch  instructions, 
in  which  case  the  program  does  abort.   It  is  easily  seen  that 
this  treatment  of  errors  allows  us  to  insert  computations 
safely  at  any  program  point . 

The  solution  map  x  generated  by  our  algorithms  defines 
the  set  of  expressions  available  at  entry  to  each  basic  block  n. 
An  additional  scan  through  all  blocks  will  then  detect  redundant 
computations  and  eliminate  them,  and  also  insert  movable  code 
into  interval  preheaders . 

II.  Modified  Use-definition  Chaining  Calculation.   In  this 
analysis,  which  prepares  data-structures  used  in  later  optimizer 
phases,  we  compute  a  variant  of  the  well-known  use-definition 
map  (cf.  [Al]),  which   we  denote  as  'bfrom',  and  which  is 
defined  so  that  for  each  use  vo  of  a  variable  V,  bfrom{vo}  is 
the  set  of  all  other  occurrences  (definitions  and  uses)  of  V 
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from  which  vo  can  be  reached  along  a  path  clear  of  all  other 
occurrences  of  V.   The  classical  use-definition  map  is  actually 
the  transitive  closure  of  bfrom,  but  we  use  bfrom  instead  since 
we  expect  this  to  speed-up  subsequent  attribute-flow  analyses 

(mainly  type  analysis) ,  and  because  the  bfrom  map  is  more 

suitable  than  the  use-definition  map  for  various  other  optimizations 

(such  as  dead-code  elimination) . 

To  calculate  the  bfrom  map,  we  perform  a  reaching  occurrences 

analysis.   In  this  analysis,  for  each  basic  block  n  we  compute 

the  set  X   of  all  variable  occurrences  vo  which  can  reach  the 
n 

start  of  n,  i.e.  for  which  there  exists  a  path  leading  from 

vo  to  the  start  of  n  which  is  clear  of  any  other  occurrence 

of  the  same  variable.   This  is  a  forward  analysis  which  uses 

the  semilattice  L  =  2  ,  where  E  is  the  set  of  all  occurrences 

of  relevant  program  variables.  (Again  global  variables  have  to  be 

analyzed  interprocedurally,  whereas  local  variables  are  analyzed 

intraprocedurally,  each  within  its  own  routine.)  The  meet  in  L 

is  set  union. 

The  space  F  of  data-propagation  maps  consists  of  functions 
f :  L  ^  L  having  the  form 

f  (x)  =  (thru-  n  x)  u  reaching,  x  G  l 

where  thru   s  l  is  the  set  of  all  variable  occurrences  vo  for 
which  there  exists  a  path  through  the  flow  described  by  f 
which  is  either  free  of  any  occurrences  of  the  associated 
variable  V,  or  else  contains  vo  as  the  last  occurrence  of  V, 
and  where  reachin_  e  l  is  the  set  of  all  variable  occurrences 
vo  for  which  there  exists  a  path  through  the  flow  of  f  which 
contains  vo  as  the  last  occurrence  of  V. 

After  analysis  is  carried  out  using  the  algorithms 
from  section  3  and  4  (code  motion  is  obviously  meaningless 
for  this  analysis)  the  computation  of  bfrom  is  completed  by 
a  straightforward  scan  through  all  basic  blocks. 
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III.  Copy  Opbiinization.   SETL  is  a  value  language,  but  for 
efficiency  its  value  semantics  is  implemented  using  pointers. 
This  usually  requires  values  to  be  copied  before  being  modified 
if  they  are  shared  by  (i.e.  pointed  to  by)  several  variables. 
In  our  implementation  of  SETL,  a  certain  part  of  excess  value 
copying  is  suppressed  by  1-bit   reference  counts,  known  as 
'share-bits',  attached  to  each  variable.   The  share  bit  of  a 
value  is  set  whenever  a  value  is  shared  (which  can  happen  in 
consequence  of  an  assignment,  imbedding  or  retrieval  operation), 
and  is  dropped  whenever  a  variable  is  assigned  a  newly  created 
value.   This  mechanism,  though  crude,  does  suppress  most  re- 
dundant copy  operations  at  run-time.    To  improve  program 
performance  still  further,  the  SETL  optimizer  includes  a 
copy  optimization  phase  whose  goals  are  as  follows:   (a)  To 
detect  potentially  destructive  value  uses  at  which  copies  will 
never  be  required  and  eliminate  the  dynamic  testing  of  the 
share  bit;   (b)  To  detect  cases  at  which  a  copy  will  always  be 
required  at  a  use,  suppress   sh?re  bit  testing,  and  emit 
an  unconditional  copy  instruction  just  before  that  use,  (c)  To 
suppress  setting  of  share-bits  that  are  never  going  to  be  tested 
(either  because  there  occur  no  subsequent  destructive  uses  of  a 
particular  value,  or  because  subsequent  dynamic  tests  of  a 
share-bit   have  been  eliminated  by  (a)  and  (b)  above) ;  (d)  To 
move  copy  instructions  out  of  loops . 

To  achieve  these  goals  we  proceed  as  follows.   First  we 
perform  available  unshared  values  analysis.   In  this  forward 
analysis  we  compute,  for  each  basic  block  n,  the  set  ' unshared (n) 
of  all  variables  whose  value  is  definitely  unshared  at  entry 
to  n;  in  addition  we  use  the  code  motion  algorithm  to  move  copy 
operations  out  of  loops.   The  framework  for  this  analysis 
involves  a  lattice  L  =  2  ,  where  E  is  the  set  of  all  relevant 
program  variables  and  where  lattice  meet  in  L  is  set  inter- 
section; also  a  space  F  of  data-propagation  maps,  where  each 
f  e  F  has  the  form 
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f (x)  =  (thru   n  x)  u  newin^,  x  e  L. 

IT  j_ 

Here  thru,  is  the  set  of  all   V  G  E   such  that  each  path  through 
the  flow  of  f  is  either  free  of  any  set/drop  of  the  share-bit 
of  V  or  contains  a  drop  of  that  share-bit   not  followed  by  any 
setting  of  it,  and  newin^  is  the  set  of  all  ve  E  such  that  each 
path  through  the  flow  of  f  contains  a  drop  of  the  share-bit 
of  V  not  followed  by  any  setting  of  that  bit. 

In  addition,  to  facilitate  code  motion,  for  each  basic 
block  n  we  compute  the  set  exposed (n)   of  all   V  s  E  such  that 
n  contains  a  potentially  destructive  use  of  V  not  preceded  by  a 
set  or  drop  of  the  share-bit  of  V. 

This  analysis  is  performed  using  the  algorithms  of  sections 
3,  4  and  7,  and  allows  us  to  carry  out  the  optimizations  (a) 
and  (d)  mentioned  above. 

To  accomplish  goal  (b)  we  perform  a  dual  forward  analysis, 
called  available  shared  values  analysis,  in  which  for  each 
basic  block  n  we  compute  the  set  ' shared (n) '  of  all  relevant 
variables  whose  value  is  definitely  shared  at  entry  to  n. 
This  analysis  is  performed  in  exactly  the  same  way  as  the 
preceding  analysis  (but  without  code  motion) ,  simply  by  reversing 
the  roles  of  share-bit   drops  and  settings.    For  each  remaining 
copy  operation  C  this  analysis  determines  whether  C  is 
conditional  (involving  share-bit   testing),  or  unconditional, 
(i.e.  whether  the  value  copied  by  C  is  definitely  shared);  if 
C  is  unconditional,  dynamic  share-bit   testing  is  suppressed. 

Finally,  to  accomplish  goal  (c) ,  we  perform  a  backward 
analysis  which,  for  each  program  point,  computes  the  set  of 
all  variables  V  whose  share-bit   value  at  that  point  reaches 
a  point  where  it  is  tested   alo:ig  a  path  free  of  any  operation 
which  sets/drops  that  bit.   For  each  share-bit   setting  S 
this  analysis  determines  whether  S  is  really  required,  and 
suppresses  S  if  the  bit  S  sets  is  not  going  to  be  tested  subse- 
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quently.  This  last  analysis  can  be  viewed  as  a  special  case 
of  live-variables  analysis  of  the  kind  described  in  part  V  of 
this  section,  and  is  performed  using  essentially  the  same 
method  as  outlined  there. 

IV.  Conversion  Optimization.   This  analysis  is  required  as  a 
final  step  in  our  type-analysis  and  data-representation  selection 
phases.   Since  SETL  is  dynamically  typed,  variable  values  can 
acquire  more  than  one  data-type  or  representation  during  program 
execution.   Consider  for  example  the  case  in  which  during 
execution  a  variable  V  acquires  values  having  data-representation 
R,  and  also  values  having  the  representation  R„.   Without 
optimization,  this  will  require  the  compiler  to  treat  V  as 
having  a  rather  general  data  representation,  and  consequently 
to  emit  somewhat  inefficient  code,  both  because  (i)  instructions 
manipulating  V  will  have  to  be  less  specific  (e.g.  off-line 
general  addition  vs.  the  much  more  efficient  in-line  integer 
addition) ,  and  because  (ii)  data-type  checks  and  conversions 
may  be  required,  prior  to  instructions  manipulating  V. 

The  optimizer  can  eliminate  some  of  these  inefficiencies 
by  associating  a  data-representation  with  each  variable  occurrence 
in  the  program  being  analyzed,  and  then  splitting  each  variable 

V  into  a  series  of  variables  V   ,  V   , . . . ,  having  representations 

^1    ^2 
R, ,  Rp,...,  respectively,  where  R, ,  R^,...,  are  more  specific 

representations  computed  for  the  occurrences  of  V,  and  where 

all  these  variables  share  a  common  'cell'  in  storage.   Then 

each  occurrence  of  V  having  computed  representation  R  can  be 

replaced  by  an  occurrence  of  the  'split-variable'  V  .   This 

technique  enables  generation  of  more  specific,  and  therefore 

more  efficient  instructions  to  manipulate  V.   However,  it  will 

also  give  rise  to  situations  in  which  two  different  variables 

V  ,  V_,   split  from  the  same  variable  V  are  linked  in   data 
Rj    R2 

flow  (e.g.  V    is  defined  and  V    is  then  used) .   In  such  cases 
^1  ^2 
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we  must  make  sure  that  the  value  of  V  at  the  second  use  does 
indeed  have  the  required  representation  R^ .   If  unable  to  guarantee 
this  assertion  at  compile  time,  we  must  insert  an  explicit 
data-type  check/conversion  of  V  to  R„  preceding  the  second  use. 
This  is  accomplished  by  our  conversion  optimization  phase. 

To  accomplish  this  task  we  perform  a  bitvectoring  data- 
flow analysis  called  'available  conversions'  analysis  which, 

for  each  basic  block  n  determines  the  set  x   of  all  split 

n  -^ 

variables  V  which  are  'available'  at  the  start  of  n,  i.e. 
whether  each  execution  path  leading  to  the  start  of  n  contains 
an  occurrence  of  V  which  is  not  followed  by  any  other  occurrence 
of  V.  In  conjunction  with  this  analysis  we  use  the  code  motion 
algorithm  to  move  data-type  checks  and  conversions  out  of  loops. 

This  analysis  uses  the  following  framework.   The  lattice 
L  used  is  2  ,  where  E  is  the  set  of  all  relevant  split-variables, 
and  where  meet  in  L  is  set  intersection.   (We  make  the  same 
separation  between  global  and  local  variables  as  in  reaching 
occurrences  analysis.)   The  space  F  of  data-flow  maps  used 
consists  of  functions   f:  L  ^  L  having  the  form 

f (x)  =  (x  n  thru  )  u  gen_,  x  e  L 

where  thru,  G  L  is  the  set  of  all  split-variables  V   which,  if 

t  K 

available  at  the  start  of  the  flow  described  by  f,  will  also  be 
available  at  the  end  of  that  flow.   That  is,  each  path  through 
the  flow  of  f  must  either  be  free  of  any  occurrences  of  V,  or 
else  the  last  occurrence  of  V  along  that  path  must  have  the 
representation  R  (i.e.  must  be  an  occurrence  of  V^) . 

Moreover,  gen_  g  L  is  the  set  of  all  split-variables 
V  which  are  unconditionally  available  at  the  end  of  the  flow 
described  by  f, i.e.   V  e   gen_  if  each  path  through  the  flow 
corresponding  to  f  contains  an  occurrence  of  V   not  followed 
by  any  other  occurrence  of  v. 


-65- 


In  applying  our  'forward'  algorithms  to  this  framework, 
we  have  to  consider  the  safety  of  conversion  motion  (insertion). 
Unlike  insertion  of  ordinary  computations,  insertion  of  a 
conversion  to  a  representation  R  can  be  unsafe  (i.e.  may  cause  a 
new  program     abort).   As  an  example,  consider  the  following  code. 

read  ( V)  ; 
(while  .  .  .) 
if  C  then 

V  :=  V  +  [x]; 
end  if; 
end  while; 
print  fv) ; 

Here,  the  code  motion  algorithm  of  section  7  would  suggest  moving 

V  ,    out  of  the  while  loop  (i.e.  would  insert  a  conversion 
of  V  to  tuple  form  at  the  loop  preheader) .   However,  as  read  V 
might  be  an  integer,  and  the  condition  C  might  be  a  type  test  to  skip 
the  concatenation  operation  in  this  case.  With  these  suppositions  the 
original  program  would  not  have  aborted,  but  the  modified  program 
will  abort. 

It  is  therefore  necessary  to  perform  a  preliminary  safety 

analysis  before  applying  the  code  motion  algorithm.   This  is  a 

backward-union  bitvectoring  analysis,  which  for  the  start  of 

each  basic  block  n,  determines  the  set  y  of  all  split-variables 

-'n         ^ 

V  which  can  safely  occur  at  that  point,  i.e.  calculates  those 

V  for  which  all  paths  forward   from  the  start  of  n  onward 

K. 

either  lead        to  a  use  of  some  V     (with  no  intervening 

^1 
occurrences  of  V) ,  where  R,  is  either  equivalent  or  more  general 

than  the  representation  R  (so  that  conversion  of  V   to  V   will 

always  succeed) ,  or  else  leads  to  a  program  exit,  or  to  a  re- 
definition of  V  (with  no  intervening  occurrences  of  V) .   The 
framework  for  this  safety  analysis  is  constructed  similarly 
to  the  framework  of  the  available  conversions  analysis. 
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Having  determined  the  sets   y   ,  we  can  handle  safety  of 

code  motion  as  follows:   Let  V  be  a  split-variable  that  we 

wish  to  insert  at  (the  end  of)  a  preheader  of  a  loop  having 

entry  node  n.   Then  it  is  safe  to  insert  V   at  that  point  if 

every  variable  V^   split  from  V  and  belonging  to  y   has  a 

Hi  n 

representation  which  is  either  equivalent  to  or  more  specific 
than  R. 

Note  also  that,  in  conversion  motion  as  in  expression  motion, 
the  changes  resulting  from  the  insertion  of  conversions  at  an 
interval's  preheader  I  need  not  be  propagated  globally  to  flows 
in  intervals  containing  I.   Such  propagation  is  required  only 
if  it  could  be   necessary  to  perform  a  conversion  that  would  have 
been  unnecessary  in  the  original  program,  or  if  propagation  would 
prevent  additional  conversion  motion   that  would  have  been 
possible  in  the  original  program.   However,  it  follows  from  the 
special  nature  of  our  code  motion  algorithm  that  these  cases 
cannot  occur.  The  proof  is  not  difficult,  but  somewhat  lengthy 
and  technical,  and  is  omitted. 

Conversion  optimization  uses  a  linear  scan  of  the  code 
in  which  we  compute  the  sets  availconv(I)  of  all  split-variables 
available  just  before  an  instruction  I.  If  I  uses  some  split- 
variable  V  which  does  not  belong  to  availconv (I ) ,  a  run-time  check 
or  conversion  into  the  V   form  is  required  before  I,  and  is 
inserted  there;  otherwise  no  such  conversion  is  required. 
Additionally  this  third  step  inserts  conversions  at  loop  preheaders 
in  a  manner  controlled  by    the  result  of  the  code  motion  phase. 
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Note  that  our   algorithm  does  not  allow  us  to  produce   the 
most  specific    diagnostic  messages  when  an  error  occurs.   Indeed, 
forward     analysis  only  tells  us  whether  or  not  a  conversion 
is  required  at  a  given  point  P,  but  does  not  calculate  the  data 
representations  possible  for  a  variable  at  P.   To  gather  this 
additional  information,  our  forward  analysis  would  have  to  be 
replaced  or  augmented  by  a  forward-union  analysis,  in  which 
we  calculated  all  possible  split  variables  V  which  can  reach 
a  particular  program  point.   Such  analysis (quite  similar  to 
reaching  occurrences  analysis)  would  provide  this  extra  information. 
In  the  SETL  optimizer  such  diagnostic  messages  are  not  required 
at  this  step,  as  they  are  produced  during  the  type-analysis 
phase.   However,  if  one  wished  to  adapt  the  techniques  that 
we  have  sketched  to  other  kinds  of  code  motion  and  elimination 
(e.g.  elimination  and  motion  of  range  checks),  such  extra 
analysis  might  be  appropriate. 

V.  Live-Dead  Analysis 

This  classical  analysis  ([He],[AU])  establishes  the 
live/dead  status  of  variables.   A  variable  V  is  said  to  be 
live  at  a  program  point  n  if  there  exists  a  path  leading  from 
n  to  some  use  of  V  which  is  free  of  any  other  occurrence  of  V 
(implying  that  the  current  value  of  V  may  be  used  subsequently, 
and  so  cannot  be  destroyed  or  discarded) ;  otherwise  V  is  said 
to  be  dead  at  n. 

Live-dead  analysis  has  many  well-known  applications, 
such  as  (a)  Register  allocation  during  code  generation,  since 
only  live  variables  need  be  put  in  registers,  (b)  Dead  code 
elimination,  since  operations  whose  output  is  dead  upon  their 
completion  cah  be  eliminated,  (c)  Static  storage  allocation  opti- 
mization, and  (d)  Various  peephole  optimizations  such  as  re- 
placement of  the  sequence   'T  :=  exp;   A  :=  T; '   by  ' A  : =  exp; ' 
provided  that  T  is  dead  at  the  end  of  that  sequence. 
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Live  variable  calculation  is  a  backward-union  analysis, 
and  is  performed  straightforwardly  using  the  algorithms  of 
sections  5  and  6.   The  framework  used  involves  the  lattice  L  =  2  , 
where  E  is  the  set  of  all  relevant  program  variables,  and  where 
lattice  meet  is  set-union.   Each  propagation  map  f  acting  on  L 
has  the  form 


f (x)  =  (thru  n  x)  ^  livein^,  x  e  L 

where  'thru,'  is  the  set  of  all  variables   V  G  E  for  which  there 
exists  a  path  through  the  flow  of  f  which  is  either  free  of  any 
occurrence  of  V  or  else  contains  a  use  of  V  not  preceded  by  any 
other  occurrence  of  V,  and  where  'livein-'  is  the  set  of  all 
V  6  E  for  which  there  exists  a  path  through  the  flow  of  f  which 
contains  a  use  of  V  not  preceded  by  any  other  occurrence  of  V. 

The  output  of  live-dead  analysis  is  a  map  'liveat', 
mapping  each  basic  block  n  to  a  set  liveat (n)  of  all  variables 
live  at  the  start  of  n.   These  sets  can  then  be  propagated 
(backward)  through  basic  blocks  to  establish  variable  liveness 
at  any  required  program  point. 
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APPENDIX    A    :    SETL    Code    for    the    Data-Flow    Algorithms 


The    code    has    beeri    t;st»J    on    i    variety    of   tsst    opograiss.    The 
data-flow    package    has    been    us?d    for    available   expression    analysis* 
reaching   occurrences    analysis    arid    liwe-dead   analysis,    (^ote    hDHswer 
that    the    code    below    does    not    coitaii    the    preparatory    phase    of 
data-flow    analysis    in    which    tie    initial    set    of    the    analysis    data- 
flow   maps    is    computed*    nor    do3S    it    show    either    the    actual    ini/3Cition 
of    the    general    data-flow    algo''ithiis    described    in    this    paper    or    the 
concluding    phase    which    utilizes    the    results    of    the    analysis    performed. 
All    these    phases    vary    suost ant i ally    from    one    analysis    t3    another* 
and   are    therefore    left    out,) 


The    code    given    here    is    jlightt^    modified    f-ou    the    Driginal 
optimizer    code.    The   modifications    are    generally    character-set    changes* 
documentation    upgrades    and    onissions    of    certain    code    segments    which 
deal    with    details    particjlar    to    the    3£TL    intermedial*    code 
representation. 
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MODULE    SETL_0PTIMI7ER    -    I  !^r  :Rtf  AL_i\  N  ALTSISJ 

s    This    module    contains    the    iit?ri/al    analysis   algorithm    described 
$    in    SECTION    3. 

$   Flow    graph    analysis    producss    two    maps    which    serve    as    input    to    the 
S    interval    analysis: 

$    1.    CESSOR:  The    successor    map    for    basic    stocks 

$   2.    PRED:  The    oredecjssor    map    for    basic    blocks 

$   Interval    analysis    produces    five    maps: 


$  1.  intof: 
$ 

$  2.  INTS: 

$ 
$ 
$ 


A  map  from  sach  njde    to  the  iiterval  immediatsly 
containing  it* 

Maps  each  rautine  to  a  tuple  of  all  Its  intervals 
in  reverse  jrjord^r.  Mote  that  Iterating  over 
INTS(ROIJT)  is  equivalent  to  iterating  from  innermost 
to  outermost  interval. 


$  5.  INT_NODES:   A  ?nap  sending  eaci  iiterwal  iito  a  tjjls  cantaining 


$ 
S 

s 


the    nodes    of    the    interval    In   reverse    postorder. 
Iterating    oi/er    INT_NOO£S(I>    is    equivalent    to    iterating 
forward    dv?-    tti?    iod?3    in    I« 


$   4.    PROPER    INTS:    The    set    of    ppor>er    (reducible)     intervals. 


$  5.  VEOGES: 

$ 

$ 

$ 

s 


The  set  of  jIL  /irtjal  edges  jddei  to  the  flo4  graph 
during  interval  aialysis.  A  virtual  edge  is  an  edge 
having  the  form  (I»  \l )  *    where  I  is  an  Interval 
and  Visa  laJs  ojtside  I  whi:h  is  a  succsssor  of 
some  node  in  I. 


$  All  these  variables  are  assjmed  to  be  globally  accessible  in  the 
$  SETL  optimizer.   AdditloniL  jlJDal  variables  that  are  accessed  in 
$  this  module  are: 


$  routs: 
$  rentry: 
$  rexit: 
t  rstop: 
s  routof: 


Set  of  all  rojtines  in  the  program  being  analyzed* 
Maps  each  routine  to  its  entry  alock. 
Maps  each  rojtins  to    its  exit  Creturn)  olock. 
Maps  each  routine  to  its  stop  blocks  if  it  exists. 
Maps  each  basic  block  to  the  rojtine  containing  it 
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$  The  module  contains  three  arincipal  routines: 

%    1.  FIND  intervals:   It?rat;s  3#er  MDUTS*  caLlinj  ottier  PDjtlnes 


$  2.  get_graph: 

$ 

s 


Builds  a  flow  graph  for  a  routine.   Code  for  this 

rajtii;  is  onittsd*  sine?  it  is  Ijrgel/  trivial 

and  caitains  many  details  special  to  the  SETL 
language. 


$  3.  FIND_INTS:        Finds  the  interv/als  of  a  flow  ;5raph, 

%    The  following  variablss  ar?  jsed  gloDally  during  interval  analysis: 


VAR 


NODENOt 

POSTNO, 

NDESCSf 

NODES, 

POSTNODESt 

NPREt 

NPOST, 

SEEN, 

IMPROPERS; 

Preorder  nade  numbering 

^ostarde''  luuaering 

NumDsr   of    descendants    of    each    node 

Tupl"    of    nodes     in    preorder 

Tjpl;    3f    i3des    in    postirder 

Cjrr^nt    position    in    preorder    numbering 

Current    position    in    poitorder    lumberinj 

^odsj    already     in    spamirig    tree 

Set    Df    'heads*     of    multiple    entry    loops 
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PROC  FIND_INTERWALS; 

$  This  routine  Iterates  3i#er  all  ttis  rsutlnes  In  a  3ETL  Drograi 
$  finding  the  Interval  grapli  for  each  routine. 

$  Initialize  all  output  objects 

INTOF  :=  INTS  :=  >/EDGE5  :=  'RD^E^.INTS  :=  INT_N0DE3  :=  Of 

CESSOR   :=  PRED   :=  c>; 

(FORALL    R    IN    R0UT5) 
GET_GRAPH(R); 
FIND_INTS(R); 

END  forall; 


PRINK* 
PRINK  • 

PRINTC* 


I 


N    T    E    R    V    A    L 


A    N    A    L    Y    S    I    S»); 


PRINK  MNTS    =•»     INTS)  i 

PRINT( 'INT.NODES    =»,    IVr_M3)E5); 

PRINT(»PR0PER_INTS    =• t    'ioPER.I NT S ) J 

PRINT(«VEDGES    =»♦    VEDGEIS); 

PRINTCINTOF    =»,    IMTOF); 

PRINT( 'CESSOR    =», CESSOR); 

PRINK'PRED    =«tPRED); 

END    PROC    FIND_INT-:RV^L3; 
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PROC  FIND  INTS(R); 


This  routine  calculates  th?  intervals  of  an  1 nt raproc edural  flow 
graph  corresponding  to  a  ^iwsn  routine  R. 

FIND_INTS  is  called  once  tj  aracess  each  procedure  »R». 
it  produces  five  maps: 


1.  intof: 

2.  INTS: 


3.  INT  nodes: 


4,  VEDGES: 


A  map  from  Jach  node  to  its  interval. 

A  map  sendiig  each  routine  'R*  Into  a  tuple 
contalTing  th;  int^r/als  of  * \*    ii  re/erse  arjorder* 
Note  that  iterating  backward  (forward)  through 
INTS(R)  is  ?qji/alent  to  iterating  from  outemost 
to  inn?Pio»  t  (  irii;r  SOS  t  to  out?ruost)  interval* 

The  outermost  iiterval  Is  not  really  an  internal 
at  all.  Instead  it  contains  all  nodes  not  contained 
in  other  iiter^als.  It  is  acyrlic  in  the  reducible 
case. 

A  map  sendiig  eaci  Interval  iito  a  tujle  C3itaining 
the  nodes  of  the  interval  in  reverse  postorder. 
Iterating  over  INT_NOO£S(I)  is  equivalent  to  iterating 
forward  ove-  the  nodes  in  I. 

The  set  of  all  edges  which  ars    part  of  some  higher 
order  graph. 


$  5.  PROPER_INTS:  A  set  of  all  proper  (reducible)  intervals. 


$  STEP  i:  Calculate  the  f3ll>i*iig  33j?:ts: 


1. 

NODENO: 

2. 

POSTNO: 

3. 

NDESCS: 

4. 

nodes: 

5. 

POSTNODES: 

6. 

BACKINV: 

7. 

targback: 

Maps  each  nade 
Maps  each  nide 
Haps  each  i»de 
Tuple  of  nodes 
Tuple  of  nodes 
The  set  of  jLL 


into  its  preorder  index 

into  its  postorder  index 

into  the  numaer  of  its  desceidants 

In  preorder. 

in  postorder 

lYi  X]  such  thit  CXf  y]  is  a  back  edge 


A  tuple  of  targets  of  oack  edges  in  preorder. 


$  (1)  -  (5)  are  built  by  an  auxiliary  depth-first  searching  roJtine 
$  »DFST».   When  we  ouild  the  node  indices  we  use  only  even  nuraoers. 
$  This  leaves  the  odd  numbers  for  target  blocks  (i.e.  interval 
$  preheaders).  Initially  only  the  even  elements  of  ^ODES  and  POSTNODES 
*  are  filled  '- 


in 
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S  The  following  macro  t?sts  far  tr??  d?scendar»:y • 

macro   is_desc(x,  y); 

(nodenocy)   <=  mode«jd(x)    ano  1m00£n0(x)   <=  modemo(y )  ♦noiscsc  y )  ) 

enom; 

OFST(R);         *   Constrjct    3    death-first    spanning   tree. 

$    Construct    the    set    BAC<I>Jtf    jf    all    r?t/;rse   oack    edg?s 

BACKINV    Z=    I    CY,     XJ    IN    PREO    ST    ROUTOFCY)    =    R    ANO    IS_DESC(Xt    Y)>; 

$    Construct    the    tuple    TARS3IVK    of    all    sack   edgs    target    nodes»    arranged 
$    in    reverse    preorder. 

TARGBACK     :=    CNODES(I):    I     :=     >t    MOOESf     »    N03ES-1     ..•     1     ST 
NODES(I)    /=    OM    *M0    \I3D-:5{I)     IM    OOMAIN    BACKINVDi 

$    STEP    2 

$  At  this  point  'TARGBACK*  CDntains  all  potential  interval  heads  in 
$  reverse  preorder.  We  Itsrat?  svsr  X  in  TARGBACK  daing  three  things: 

$  1.  Build  the  set  'IMPROPE^S*  of  sjch  nodes  X  which  are    heads  of 
$     multiple-entry  loopst  aid  tius  ar?  'sourcss  of  ir redJC i 3l L i t /• . 


»  2.  For  each  X  find 

%■  graph  In  which 

S  interval  has  be 

$  a  single  node  - 

$  not  passing  thr 

S  node  which  is  i 

$  then  X  is  a  hea 

$  MMPROPERS*.  Ot 

S  and  thus  is  an 

$  REACHUNOER  *    IM 

$  intervalf  and  w 

$  improper  interv 


the  set  'REA 
each  alrsadf 
en  logic  a  L  Ly 

its  target  3 
ough  X  t4i3se 
ot  3  ds5:;il3 
d  of  a  ID Jlt  10 
herwise  X  is 
int  ^rv^L  i ;a  J 
PROPERS  =  t>» 
e  add  it  to  • 
al. 


CHUNOER*  of  nodes  (in  the  reduced 
processed  prober  or  improper 
•sqj3Shed*»  i.e.  identified  with 
l3Ck)  which  r?ach  X  along  a  path 
final  edge  is  a  back  edge.  If  any 
nt  of  X  belonjs  ta  'REACHUMDER** 
le-entry  loopt  and  we  add  X  to 
a  head  of  a  single-entry  loopt 

in  3  jr  sense;  i  f 

then  that  interval  is  a  proper 
PROPER  INTS»;  otherwise  it  is  an 


$  3.  If  X  is  an  interval  head  then: 

$  a.  Create  a  new  target  jLjcc  *TiK'» 

$  b.  Add  TBX  to  »INTS(R)«  and  set  I NT_NODES( T8X)  to  CI. 

S  c.  For  all  Y  in  REACHU«^5ER»  set  IMTOF(Y)  :  =  T3X 

$  d.  Update  the  flo4  graai  t3  si3i#  the  insertion  of  TBX. 

ROOT  :=  RENTRY(R); 

INTS(R)  :=  c  d; 


(FORALL    X     IN    TARGBACK) 

REACHUNOER  :=    CX}; 

NEWREACHUNOER  :=  €  INTO"  .LIM 
5  INTOF  .LIM  Y  is  the  largest  interval 
$    contains    Y    (see    below    for    details). 


Y     ;     Y    IM     3ACKINl/CX>>     -    tX>  t 
constructed    so    far    whicli 
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(WHILE  NEUREACHUNDER  /=  C>) 
Y  FROM  NEWREACHJNDE^; 
REACHUNOER  UITH  fS   $  G?t  a  new  sL-meit  of  •^EACHJNDER* 

IF  NOT  IS_OESC(Y»  X)  THEN   S  We  have  a  mult ipLe-entry  Loop 

improf»:rs  Win  <; 

QUIT  while;  $  Exit  the  while  loop 

ELSE 

newrea:hund:r  *■:- 

(C   info-   .lim  ?   :   z   in  pr:d{:y>}  -  keachjnder); 
END  if; 
END  while; 

if  X   in  impropers  HEN  CONTINUE  forall;   end; 

S  Here  X  is  an  interval  hsai, 

TBx  :=  get_targ(x>; 

$  The  GET_TARG  routine  creat?s  a  new  basic  block*  initially  containing 
$  only  a  Label  and  an  jncondi t ioial  fjmp  to  X.   Code  for  this  routine 
$  is  omi  tted  here • 

$  Insert  TBX  in  proper  otace  in  the  treet  and  initialize  its  attributes 

NODENO(TBX)  '.-     N03:N0(<>-L; 

nodes(nodeno{T3X))  :=  nx; 
posTNO(TBx)  :=  PosrMO{X)+i; 
posTNODES(pasTNO(T3<) )  :=  nx; 

$   Note    that    there    is    no    need    to    coupute    NOESCS(TBX)»    as    this    value    will 
$   not    be    used    later. 

INT_NODES{TBX)     :=    Ili 
*    TBX    represents    the    interval    with    head    X. 

INTSCR)     WITH    TBX; 

$  Check  if  TBX  is  proper 

IF  REACHUNOER  »  IM'^a^Ei^S  -    f>  THEN 
PROPER_INTS  Win  TBX; 

END  if; 

$  Map  each  node  in  REACHJNDE^  to  its  cjntaininj  interval  T3X 

(FORALL  Y  IN  REACHUNOER)  INTOF(Y)  :=  TBX;  ENO; 

$  Update  the  flow  graph  to  acc3Jit  for  tie  ins;rtiDn  of  T3X  into  it. 
$  This  involves  the  following  actions; 

$  1.  Add  an  edge  CTBXjXi  to  the  graph. 
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%    2.  Replace  all  edges  entering  the  interval  tirough  'X*  by  edges 
$     entering  TBXt  and  chang?  th?  corrs spending  branch  instructions 
$     In  the  program  cod;. 

$  3.  For  each  edge  CUtV]  leai/lng  the  interval  »*hoss  head  is  X»  add 
$  a  •virtual*  edge  CTSXttf]  t a  ti?  graph*  Tils  ejgs  is  aided  to 
$     »VEDGES». 

UPDATE(X,  TBX»  ^El  A:  H  UMilR)  ; 

END  forall; 

$  Build  the  outeriBost  •1nterrfal»»  Identified  bf    the  ent  r/  node  •ROOT*. 

INTS(R)  WITH  root; 
INT_NODES(RO0T)  :=  Hi 

PR0PER_INTS  with  root;    $  ^oat  Kill  be  removed  from  this  set  If 

i    actjall/  Improjer 

$  Iterate  over  the  nodes  In  revirs?  aostorder»  adding  each  nod?  to 
$  INT_NODES.  If  a  node  has  its  iit^rval  head  indefined  put  It  In  the 
$  outermost  Interval. 

(FOR  I  :=  «  P0STN03ES,  tf  P0STN30ES- 1  ...  I) 
X  :=  POSTNODES(I) ; 
IF  X  =  OM  THEN  CONT  FOR  i;  EMO; 

HO  :=  INTOF(X); 
IF  HD  =  OM  THEM 

HD  :=  INTOF(X)  :=  root; 

IF  X  IN  IH3R3P£<S  THl^ 

proper_int3   LESS  ^oot; 
END   if; 
end  if; 

int_nodeschdi   iiiTH  x; 
end  for; 


eno  proc  find  ints; 
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PROC  UPDATE(Xt  TBXt  I^D3E5); 

$  This  routine  updates  the  flow  graph  to  show  the  Insertion  of 
$  the  target  block  •T8X».  It;  irgunents  are; 

$  x:         The  Interval  head 

$  TBX:       The  target  olock 

$  IMODES:    The  nodes  In  the  1ater»/al 

$  In  this  code*  only  manipulation  jf  the  flow  ^raph  is  shown*  code 
i    manipulating  inoiwidual  Instructions  within  slocks  is  omitted. 

CESSOR{TBX>  WITH  x; 
PRED{X>       UITH  T3X; 

$  fJext  we  iterate  over    all  tis  oreJ?:?ssors  of  X  wtiich  are    r»ot  in 
$  the  interval  modifying  ths  CIS50R  and  PR£D  maps  as  we  go. 


(FORALL  Y  IN  PREDCX>  ST     Y  '^DTI^  IMOOES  ♦  CT3X>) 

CESSORfY}   LESS  X; 
CESSORCY>   UITH  TBX; 

PRED<X>    LESS  Y; 

pred{;tbx>  UITH  y; 
END  forall; 


$  Find  all  edges  which  leave  the  interval  and  add  a  virtual  edge 
$  from  TBX  for  each  such  edj?. 

(FORALL  U  IN  INODESt  i     IN  CISJO^CUl 
ST  Y  NOTIN  INODES  and  INTOF(Y)  /=  U) 

CEssoRCTBx}  uirn  y; 

PREDtY}     UITH  TBx; 
VEDGES  UITH  CTBX,  Y]; 

END  forall; 


END  PROC  update; 
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PROC  DFST(R); 

$  This  routine  builds  th?  dsjth  first  spanning  tre?  for  a  rautine 
$  »R»,  We  initialize  counters  for  the  v/arlous  node  indices  and  then 
$  call  •DFSTl*  to  do  the  recjrsive  tree  walk. 

NOOENO  :=  c>; 
posTNO  :=  o; 
NDESCS  :=  €>; 
NODES  :=  Lli 
POSTNODES  :=  Lli 

SEEN  :=  c>; 

NPRE  :-  NPOST  z=   d; 

DFSTKRENTRYCR)); 


END  PROC  DFST; 
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PROC  OFSTKX); 

$  This  routine  builds  the  death  first  spanning  subtree  rooted  at  the 
$  node  'X*. 

NOOENO(X)  :=  (NPRE  ♦:=  2);    $  Note  the  use  of  even  indices  only 

ND£scs(X)  :-   o; 
NODES(NPRE)  :=  x; 

SEEN  WITH  X; 

(FORALL    Y     IN    CESSO^tX}     57    f     NOm    SEEN) 
DFSTKY)  ; 

NOESCS(X)     ♦:=    (NOE3C5(r)     *■     2); 
$    Each    node    is    counted    js    t«*J     i?  sc^idait  s  t    to    latch    the    Jsaje    if 
S    only    even    Indices     in    NODEND    and    POSTNO. 
END  forall; 

posTNO(x)    :=   (NPOST  ♦:=   2); 
posTNODES(NPosT)    :=  x; 


END    PROC    DFSTi; 
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OP  .LIM(Ft  X); 

$  This  operator  finds  a  t/alj;  »r»  such  that  Y  =  Ft  =  l^    ...  -(<)))) 
$  and  F(Y)  =  OM. 

$  Note  that  unlike  Tarjan's  sriginal  approach  je  omit  oath 

$  compression*  tree  balancing*  etc.  for  the  sate  of  simplicity* 

$  though  these  could  easily  tj    aideJ. 


Y  :=  x; 

(WHILE  F(Y)  /=  OM)  f 


-  { Y ) ;  -:  ^  3 ; 


RETURN  Y; 

END  OP  .lim; 
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MODULE    SETL_OPTIHIZER    -    0 AT4Fu OU_SOL V£R; 

$    This    module    contains    a    parkag?    of    j^neral    DJPDOse 

S    routines    to    solve    bit    vector    data    flow    proolems    either 

$    Int raprocedurally    or    inter  procedural ly.    We    can    distinguish 

$   between    four    basic    tifpes    of    ;jch    analyses*    according    to 

$    ttie    character    of    the    desired    analysis: 

S   FORWARD      -    Data    is    to    be    Drooagate^d    in    the    direction    of 
$  the    floKt     frDii    jricedjr;    entries    fori*ard. 


$    BACKWARD 
$ 

$    MEET 
$ 

$ 
$ 


Data    is    to    be    proaagated    in    the    reverse 
direction    of    t  i  ?    floi**    from    exits    backward. 

Whenever    two    3  a t hs    converge    (for    forward    analysis) 
or    diverge    (for    packward    analysis)    take    the    meet    (set 
intersection)       if    data    values    prDpagated    alorig    tiese 
pat  hs. 


$  JOIN 


As  in  MEET»  except  that  the  join  (set  union)  of  the 
cor  res poTdiia  iata  viIj»3  is  to  je  taken. 


$  Typical  examples  areZ    expression  availability  analysis 
$  is  a  forward  -  meet  analirsisi  unconditional  exposure 
$  of  expressions  (also  known  as  'very  busy*  expressions 
$  analysis)  is  a  backward  -  meet  analysis;  reaching 
$  definitions  analysis  is  a  forward  -  join  analysis*  and 
S  live  variables  analysis  is  a  packward  -  joii  analysis. 

$  As  noted  in  chapters  5  aid  i*  forward  and  oackward  analyses 

$  require  substantially  diffjreit  L93ic«  so  that  each  of  then 

%    Is  executed  In  a  different  suDpackage;  howei/er*  the 

S  difference  between  meet  aid  join  problems  tjrns  out  to 

*  be  rather  minor*  so  tiat  t  ^  sy    oath  can  be  handled  p|r 

$  the  same  (forward  or    backward)  package*  using  a  switch 

$  to  indicate  whether  a  particjlar  analysis  is  of  meet  or 

$  Join  type. 

$    This    module    exports    the    fillowing    procedures: 


$    C5RAPH_ANALYSIS 
$ 


Call  g'apti  anaL/sis  routiie*  to  pe  called 
once  before  solving  any  data  flow  proolera 
inter pr PC ed J ra I ly. 


S    INTERPR0C_FWD_ANALYSIS    -    Call    t  fi  i  s    to    solve    i  nt  e  r  pr  PC  ed  J  r  al 
$  forward    data    flow    analyses. 

$   I1\ITRAPR0C_FWD_ANALYSIS    -    :alL    this    to    solve    1  nt  rapr  oc  ed  J  r  al 
$  forward    data    flow    analysis    for    a    given 

$  procedure. 
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$  INTERPROC_BACK_ANALysIS  -  CaLL  this  to  solve  i nt erproceduraL 

$  backward  data  flow  analysis 

$  IMTRAPROC_BACK_ANALYSIS  -  P^rfoms  i  n  t  rapro:edLir  a  I  aackiiard 

t  analysis  for  3  jiven  procedure. 

$  This  package  assjmes  the  folloMlng  jlobal  oijects  to  be 
$  aval  table : 

$  CGRAPH  -  The  program  call  graph*  represented  as  a  set 

$  of  edges;  an  ;dg?  (>t3)  is  in  C3RAPH  iff  :>  i  s  a 

$  procedure  which  contains  a  call  to  the  procedure  Q. 

%   ROUTS  -  Set  of  all  program  arocedures  (1«e.  all  nodes 

$  ofthecallji-aji). 

$  SYM_MAIN  -  Main-program  identifier  (i.e.  tie  entry  node  of  the 

$  call  gra3h). 

$  ROUTOF  -  Maps  each  black  ta  the  procedure  containing  it. 

$  RENTRY  -  Maps  each  procedure  to  its  entry  block. 

$  REXIT  -  maps  each  procedure  to  its  exit  (return)  block. 

$  RSTOP  -  Haps  each  procedure  to  its  stop  block»  if  any. 

$  CALLSIN  -  Maps  each  procedure  to  the  set  of  all  call  blocks 

$  in  it. 

S  CALLPROC  -  Maps  eaci  caLL  jLjcc  t3  the  proredur?  it  calls. 

$  CESSOR  -  The  program  flow  graph*  as  a  union  of  the  flow 

$  graphs  of  all  aroceiures.  An  edge  ( *< »  ^)  is  ii 

$  CESSOR  iff  either  M  contains  a  oranch  to  N»  or  else 

$  M  is  a  call  olock  aid  M  is  the  3lock  immediately 

$  following  ^.  The  nodes  of  the  flow  graph  are  either 

S  basic  blocks  ar  derived  intervals  (which  are 

$  represented  by  their  target  blocks)*  in  which  case 

*  an  edge  (I^T*  1/ )  in  -ISSOR  can  indicate  the  possibility 

$  of  a  transfer  of  control  from  the  interval  INT  to  a 

$  successor  V  of  some  node  in  INT.  These  edges  are  called 

$  virtual  edges  (as  aDoveJ  see  th?  interval 

S  analysis  package  for  more  details). 

$  PRED  -  The  inverse  iiaa  of  CESSOR. 

$  INTS  -  Maps  each  oroiedjre  to  the  tupl;  of  its  intervals 

S  in  reverse  prsorder  (relative  to  a  depth  first 

$  spanning  tree  of  its  flow  graph). 


$    INT_NODES    -    Maps    each    interval    to    the    sequerice    of    its    nodes 
$  in    1nter»/aL    order    (i.e»»    revers?    postorder). 

$    PROPER_INTS    -    The    set    of    all    oroper    internals    (those    which    do 
S  not    contain    irredjcible    nucleii). 

t    INTOF  -    Maps    each    fix    jraai    T3ie    to    th?    intjrval    containing 

S  it. 

%   VEDGES  -    Set    of    all    i/irtjal    sijss    (see    tie    descriptlofi    of 

%  CESSOR    above). 


$   In    addition    this    modjle    jje;    the    folloMing    g lobal-wi thln- 

$   the-module    waria3lest    ths    first    three    of    which    are    used 

S    to    transmit    flags    and    analirsis    constants    between    imer    routines* 

$    while    the    rest    are    ojilt    ay    a    recursive    depth-first    search 

S   procedure    during    call-graph    analysisf    and   are    used    later    in   that 

$   ana  lysis  • 


VAR 


lOt  $ 

ZEROt  » 
MEET_FLAGt  « 

SEEN*  $ 

CNPREt  » 

CNPOSTt  » 

NOOENO*  S 

POSTNOt  S 

NOESCS;  % 


Identity  flow  map 

Null  data  state 

T^J:  if  »e?t  analysis;  otherwise  FALSE 

Procedjres  already  in  0F5T  of  cgraph 

Cjrrent  preorder  index  It  0F3T 

;jpr;it  jJit  3r  ii'-     index  in  0F5r 

preorder  numbering  map 

postorder  numbering  map 

Hom    of  d?3ceidaits  map 
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PROC  CGRAPH_ANALYSIS; 

$  This  procedure  performs  tlie  call  qraph  analirsis  needed  for 
$  our  interprocedural  data  fla-*  aialysis  sol>/;r.  It  campjtss 
$  the  following  objects: 

$  CG_SCCS     -  A  tuple  of  (roats  of  the)  stroTgly  connected 

$  componeTts  jf  z^rai^t    arranged  in  reverse  postsrder. 

$  SCC_NODES   -  Maps  each  (root  of  a)  strongly  connected  corapoient 
$     ~  into  a  tupl?  containing  its  nodes  In  reverse 

$  postord»r« 

$  SCC_D       -  Maps  each  (root  of  a)  strongly  connected  component 

$  S  into  an  estinate  3f  its  looo-int ; rcoinsct sdnsss 

$  parameter  Dt  defined  as  the  maKiraal  number  of  back 

$  edges  along  any  acyclic  path  in  S  (iie  do  not  attempt  to 

S  obtain  that  3r?cis»  i/alue*  but  rather  use  a  crjde 

$  upper  bound  for  iti  naisely  the  nurooer  of  back 

S  edge  targets  contained  in  S«) 

$  Begin  by  calling  a  standard  depth  first  spanning  tree 
$  routinet  which  will  compute  the  following  o3jects: 

$  NODENO  -  Preorder  node  njBDerinj  m33. 
$  POSTNO  -  Postorder  node  numbering  map. 
$  NOESCS  -  Number  of  descendants  map. 

CDFSTO  ; 

S    Tree-descendancy    macrot    identical    to    the    ons    used    for    1nter/al 
$    analysis* 

MACRO    IS_DESC(Pt    Q) «       1    Test    whether    P   is    a    descendant    of    Q 

{nodeno(p)    >=  ^odevdca)    afo  noo£no(pi   <-  '^ooemo  (q)  ♦ndescsc  q)) 

endh; 

$  Next  compute  some  auxiliary  oojscts: 

INVERSE  :=  CCP»  Q]  :  Ea»  ?1    IM  CS<APH>;  i  Inverse  call  graph 
INVPOSTNODES  :=  CC «R0UTS*1-M .  P]  :  N  :=  P0STNO(P)>; 

$  Procedures  in  their  reverse  postorder 
BACKINV  :=  tCPt  Q3  IN  I\H^:^5E  ST  IS_D£SC(3t  P>>; 

$  Set  of  all  inverse  back  edges 
TARGBACK  :=  DOMAIN  BAC<IMtf;    t  Back  edge  targets 

CG_SCCS  :=  CD;      S  Ss;  abav? 

SCC_NODES  :=  SCC_D  :=    U i  S  See  above 

SCCROOT  :=  C>;   $  strongly  connected  compDnent  root  map 
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$  Iterate  through  the  procedjrss*  lootc  ing  for  strongly 
%    connected  components* 

IFOR  I  :=  1  ...  »INVP0SrN30£3) 
P  :=  INVPOSTNOOES(I) ; 
IF  SCCROOT(P)  =  OM  FHiM   t  W?  hawe  a  lew  root  of  a  3.C.C. 

sccROOKP)    :=  p; 

CG_SCCS    UITH    P;         $    P    corresponds    to    the    new    component 

scc_N0DiSC3)   :-   :>]; 

IF  P  IN  TARGBA2<  THIN   $  This  is  a  non-trivial  S.C.C. 
NEUNODES  :=  BAC<I\«i/C'>  -  CP}; 

%    Ne<  T3J?s  ta  3e  added  to  ti?  S.C.C. 

scc_0(P)    :=   i; 

$    develop    cojit    of    la.    of    backedge    targets    in    the    S.C.C. 
(UHILE    NEyNDDES    /=     C>) 
Q    FROM    NEUN30ES; 

SCCROOT(Q)     :-    P;    %    Mark    Q    as    belonging    to    the    SCO 
IF    3     IN    TA^^GB/VCK    THEN    SCC_D(P)     ♦•:=     i;     ENO; 
NEJNOOES     ♦:=    CR    IN    INVERSEfQ}     ST 

IS_0E3C(Rf    =)     AND    SCCROOT(R)     =    0M>; 

ENO  while; 

ELSE  $    J?    have    a    trivial    S.C.C. 

sc:_D(^)    :-    o; 
END  if; 

ELSE  $    P    belongs    to    3    3CC    alreadif    scannei 

SCC_N00ES(3::^D3T{5))    JIH    p; 

END  if; 
ENO  for; 


PRINT(»    •); 

PRINT( •  CALL  G^A^H 

PRINT(»    •); 

PRINT(»CG_SCCS    =»t     CG_3CCi); 

PRINT( »SCC_NODES    =»f     SCC    NODES); 

PRINT(»SCC_D    =•»    SCC_0);~ 

END    PROC    CGRAPH     ANAL\rSI5; 


A    NAL    Y     SI     S»); 
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PROC  cdfst; 

$  This  routine  builds  the  de3th  first  spanning  tree  of  the  call 
$  graph.  We  Initialize  count?rs  for  the  varlojs  node  nuraberingi 
$  and  then  call  •CDFSTl*  to  Jo  the  recursive  tree  walk. 


NOOENO  :=  NDESCS  :=  POSFNO 

SEEN  :=  {>; 

CNPRE   :=  CNPosT   :=   o; 

CDFSTKSYM    MAIN)J 


=  c>; 


END    PROC    cdfst; 
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PROC  CDFSTKP); 

$  This  routine  builds  th?  djjti  first  spanning  tres  startirjg  nith 
$  node  »P».  This  routine  differs  in  v/arious  details  from  the  depth 
$  first  spanning  routlns  us?J  for  Interval  analysis. 

NODENocp)    :-   (CNPRE  ♦:=   d: 
NDESCS(P)    :=  o; 

SEEN  WITH  P; 

(FORALL  Q  IN  CGRAPHfP)  ST  Q  NOTIN  SEEN) 
CDFSTICQ); 
NDESCS(P)  ♦:=  (NOE5C5(a)  ♦  Hi 

END  forall; 


posTNO(P)  :=   (CNPDST  ♦:  =  d; 

END  PROC  CDFSTl : 
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PROC  INTERPROC_FUD_ANAL)fSIS(RU  F,  WR  SOLN,  ID_PRM,  ZERO_P^Mt 

MEEr_FLAG_PRM»  Mo7e_C0OE» 
RU  EXPOSEDt  WR  INSERT); 

$  Note  declarations  of  •reaJ-writ?*  parameters  C*^W*>  and  •write-only* 
$  parameters  {•WR«), 

$  This  is  the  master  routia?  to  perform  a  specific  data  flow 
$  analysis  i nt erproc ed jraL L y •  Its  parim^ters  are: 

$  F    -  Maps  each  edge  (Mf  ^)  in  the  flow  graph  to  a  compact 

$  representat  i  01  of  its  d  1 1  a -ar  apagat  ioi  maa  F(^<tN)« 

$  initially  this  information  has  to  be  arovided  only 

$  for  basic  blocks  (ojt  not  for  call  blocks);  the 

$  first  phase  of  the  aialysis  will  fill 

$  in  additional  entries.  Each  F(M«N)  is  represented 

$  as  a  pair  CAt  B]  In  L  x  L»  sjch  that  for  each  X  in  L 

$  F(HtN)CX)  =  X*A  «•  3i  3tJ  \    contains  3  (this  latter 

S  condition  ensures  tiat  th:?  representation  is  uniquet 

$  and  also  simplifies  some  fjnctional  manipulations). 

$  SOLN-  The  solution  i/;ctop  fir  ti;  aialysis.  30LN  naps  eacfi 
$  flow  graph  node  to  the  data  found  to  oe  known  at  its 
$        entry. 

S  The  next  three  parameters  are  transmitted  internally  between 
$  subprocedures  by  assigning  th^m  to  global  wariaalesi  as  they 
$  are  constant  per  analysis*  Th?  : orrssoonding  glooals  arsZ 

$  ID  -  The  identity  map  r eor esent at  1  on.  ID  =  CU»  OJ»  where 
$        U  is  the  universal  set  ov?r  which  bit^ectors  are  taken 
%  in  this  analysis  (?.g.  set  of  all  program  expressionst 

S        set  of  all  variables  etc.) 

S  ZERO  -  The  initial  data  t/alj?»  i.e.  flow  data  assumed  at  ths  main 
$        program  entry. 

$  HEET_FLAG  -  A  flag  iidicating  whether  the  analysis  is  a  meet 
S        analysis  or  a  join  analysis. 

$  AUX_F  -  These  are  auxiliary  propagation  maps.  For  each  flow 
$        graph  node  Uf  ftJX_-(U)  denotes  the  effect  of  propagation 
$        from  the  entry  to  It  t**?  interval  containing  Jt  through 
S        It  to  the  entry  of  U. 

J  MOVE_CODE  -  A  flag  indicating  tnat  code  motion  is  regjir?d. 
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%    EXPOSED  -  This  is 


% 

% 
% 
$ 
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initially  ttie  set  of  compj  tat  ions  (corresponding 
el;fl?it5  (3it>))  jxposed  it  t  fi  e  start  of  ;ach 
N  (i.e.  comauted  with  no  3rior  kill  in  N).  The 
ter  ptias?  of  ojr  analysis  attaches  an  •EXPDSEO* 
ch  iiter/al  ar^c-ssed.  EXP3SEDCI}  is  the  set  of  all 

T  for  which  there  exists  a  computation  of  T 
interval  I  wfiich  jould  becDme  redundant  if  and 
ecan?  3i»aila3le  at  the  entry  to  (the  target  block 
f  h3wewert  that  th?  logical  place  at  which 
s  movaol?  out  of  an  interval  I  should  be  inserted 
of  the  target  jlack  of  I*  -ather  than  its  start, 
t  target  block  is  nonempty  then  EXPOSEOtI} 
present  those  nowaale  compj tat i ons.  "or  this 
rovide  tie  parameter  •INSERT*  Jfiici  gives  the 
of  aova3le  code. 


$  IfJSERT  -  This  output  paraieter  ^ill  map  eaci  interval  into 
$        the  set  of  all  coma Jt at i ons  movaole  out  of  its  loopt 
$        which  are  to  be  inserted  at  the  end  of  the  target 
$        block  of  the  interval.  The  actual  ins?rtl3ri  should  be 
%  performed  by  the  callinj  procedure. 


$  Our  analysis  procedures   iiake  frequent  use  of  the  folloi*ing 
$  ooerators  (which  could  a?  also  written  as  macros*  if  it  were 
S  not  for  the  convenience  of  the  infix  notation  that  we  prefer 
%    to  use): 


$  .COMP 

$  .MEETJOIN 
%    .HJV 
$  .OF 


Functional  conoosition 

Functional  meet  or  join»  depending  on  MEET_FLAG 

Meet  or    join  jf  lattice  values 

Functional  apolication 


$  All  these  operators  have  eleientary  set  expressions*  see  oelow 
$  for  details. 


Note  also  that  these  operators  must  be  prepared  to 
undefined  flow  values*  which  ^ill  be  represented 
oy  a  special  constant  •FD1»;  for  exaraplet 


G  .COMP  FOM  -    FOM 
(concatenation  of 
still  undef i  ned) 
G  .MEETJOIN  FOM  = 
(a  join  or  a  meet 


.COMP  i  =  F3M; 

an  undefined  flow  with  a  defined 


FOM  .MEETJOIN 
of  an  Jidefin; 


G  =  S. 

d  flow  Mi  th 


handle 


3ne  is 


a  defined  floi* 


yields  the  defined  flow.) 


%    another  special  constant 
$  state  in  L. 


•X0»1»  is  jsed  to  denote  the  undefined  data 
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$  Transfer  constant  p^^3ffl^t;^s  to  3lo3als 

ID  :=  id_prm; 
ZERO  :=  zero_prm; 

1EET_FLA6    :=    M: ET_- LA3_'Ri; 
$    The    master    procedure    consists    of    the    foLLouIng    three    phases: 
$    Interpr  ocedura  I    elimination   ptiase 

AUX_F     :=    INTERPROC_FUO_iHM[NAT£(F); 

$    If    code    motion    is    reqjir?!    thsn    oerforra   an    idditional 
S    phase*    computing   the    set>    9f    aovatil?    code. 

IF  MOVE_CODl  THEN 
INSERT  :=  c>; 
(FORALL    P    IN    ROUTS) 

PR0PAGATE_£XP0SE:3(Pf    F,     AJX_Ft    EXPOSED*     INSERT); 
END    FORALL    P; 

END  if; 

$  Find  data  at  procedure  entries 

ENT_INF  :=  ENTRY_INFOCFt  AJX_Ft  INSERT); 
$  Final  propagation  phase 

SOLN  :=  {> ;      $  Iiitiiliz?  th?  solution 

(FORALL  P  IN  ROUTS) 

FWD_PROPAGATE_I^(»t     "t    *J<_-t     SOLN»    EMT.Ii^FCP), 

H0t/E_C00E*~IN3ERT); 

END  forall; 

return; 

end  proc  interproc  fwd  analysis; 
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PROC  INTERPROC_FWO_ELIMI  NATKRW  F); 

$  This  is  the  driver  rautii;  for  tfie  first  int erprocedural 
$  inner-to-outer  interwaL  piss.  Procedures  are    analyzed  in 
$  the  folloMing  order:  we  process  the  strongl/  connected 
$  components  of  the  call  graph  in  their  postorderJ  for  each 
$  sjch  component,  we  iteratj  tirojgh  its  procedures  in  their 
S  postorderf  no  more  than  2*0»l  times,  where  D  is  the  loop- 
$  interconnec tedness  parameter  of    the  component. 

AUX_F  :=  {:>;    $  initialize  ajxiliary  maps 

$  Iterate  through  the  S.C.C.s  of  cgraph 

(FOR  I  :=  »CG_SCCS,  »CS_5CCi-l  ...  1) 

sec  :=  CG_SCCS(I);    $  get  a  3.C.C. 

SCC_PROCS  :=  S::_M3)I5( 3C:) ;   $  Procs  in  that  s.c.c. 

FLOU  FLAG  :=  • -  I RST_I NTIR» ;  $  First  processing  of  SCC 


(FOR    J    :=    I 


2    •     SCC    0(5CC)     ♦     1    UMFIL    PROC    CONVERGi) 


PROC_CONVERGE    :=     T^JEi 

(FOR  K  :=  l»SCC_?R3CS,  »SCC_PR0CS-1  ...  1) 

p  :=  scc_p^3C5(K); 

PR0C_C0NVER5E  := 

INrRAP<D:_-WD_E:LI'1IMATE(P,  AUX_F,  F,  FL0U_FLAG> 

AND  »^oc_con\/erge; 

$  The  INTRAPR0C_FUD_ELIMINATE  routine  analyzes  Pi     its  fourth  parameter 
$  indicates  whether  the  analirsis  is  first-time  i  nt  erpr  ocedural ,  second 
$  -time  interprocedura L  ar    i i tr jor oc e Jural »  it  retjrns  a  flag  to 
$  indicate  whether  information  in  P  has  stabilized. 

END  FOR  KS 

FL0W_FLA6    ;=    •3:C0M3_I^TER  •;    $    Additional    passes    thru    SCC 

END    FOR    j; 

END    FOR    I ; 

RETURN    AUX_F; 

END    PROC    INTERPROC    FUO    ELIMINATE". 
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PROC  intraproc_fwd_:lim[  N\r:(>»   ^u   aux_f»   rw  Ft   FLoy.FLAo; 

$  This  routine  performs  an  i nt rap- ocedural  elimination  phase 

$  for  the  procedure  P»  usiaj  interval  analysis*  The  fourth  parameter 

$  indicates  whether  this  DJtii?  las  jeen  invoked  by  the 

S  i nt rapr ocedural  solver  or  ay  the  int erprocedural  solver»  ani 

$  in  the  second  caset  Kheth^r  tliis  is  the  first  time  '  is 

$  being  processed  or  not. 

$  In  this  pass  we  iterate  ttirough  the  procedure's  intervals 

$  in  an  inner-to  -outer  ord?r  (i.;.  ii  revers?  pr?order  of  th?ir 

S    heads  in  a  DFST  of  the  flaw  graph  of  P).  For  each  interval 

$  I  processed  in  this  manner  we  compute  a  set  of  data-propagation 

$  maps  of  the  form  Fdt  J)»  wiere 

$  (I)  If  U  is  in  !♦  then  this  map  is  an  auxiliary  map  (which  will 
$  be  denoted  as  AUX_F{j)»  I  3»iig  imolicit  in  this  case)  which 
$  represents  the  propagatiai  effeit  as  controL  advancjs  froi 
$  the  start  of  I»  thru  I«  to  the  start  of  u; 

$  (2)  If  U  is  not  in  If  th?i  J  is  a  successor  of  some  node  ii  I. 

%    Here  the  map  F(It  U)  represents  the  propagation  effect  as  control 

$  advances  from  the  start  of  I*  through  I«  to  the  start  of  u; 

$  in  this  case  F(I»  U)  is  nesde:!  for  the  processing  of  th» 

$  intervals  containing  I,  Note  that  CIt  U]  is  a  virtual  edge 

$  in  our  flow  graph;  thus  the    elimination  phase  extends  the 

$  map  F  so  as  to  be  defined  alsa  an  virtual  eJges. 

$  Any  interval  I  processed  in  this  routine  is  either  a  proper 
$  strongly  connected  interval*  or*  if  it  contains  •improper* 
$  nodes  (i.e.  nucleii  o1    i p- e die i > i Li t y ) *  is  a  single-entry 
$  strongly  connected  subgraph.  In  the  first  case  we  only  have  to 
$  iterate  thru  the  nodes  of  I  twice*  out  in  the  second  case  till 
$  convergence. 

$  The  outermost  •interval*  is  eitner  a  single  entry  acyclic 

S  graph  (if  it  does  not  contain  irreducible  nucleii)*  or  a 

%  general  single-entry  graoi  otherwise.  For  this  *interval*  we 

$  iterate  either  once  in  the  first  case*  or  till  convergence 

$  ot  her  wise. 

$  If  the  present  routine  is  to  oe  used  for  interprocedural  analysis* 

$  we  first  reset  the  propagation  maps  for  call  blocks  in  P.  If  none  of 

i    these  maps  have  changed  f  -  a  ii  the  Last  processing  of  => » 

$  then  obviously  analysis  of  P  has  staoilized  and  we  can  return 

$  immediately.  Moreover*  intervals  need  be  re-processed  if  and  only 

$  if  they  contain  a  call  oLjcIc  4h3;e  local  effect  nas  changed* 

$  or*  recursively*  contain  an  interval  whose  local  effects 

$  have  changed.  In  terms  of  the  'INTOF*  tree*  we  only  have  to 

$  re-analy/e  intervals  l/ii3  aLanj  sotie  path  from  the 

$  root  to  a  call  block  whose  local  effect  has  changed.  This 

$  can  make  reprocessing  of  a  orocedure  considerably 

$  faster  than  initial  arocessinj. 
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IF  FLOU_FLAG  =  'SECONO,!  >J  FER  •  THEM 
$  Process  only  intervals  caitaiiiig  calls  witi  nei«  effect 
NEED.PROCESS  :=  C}; 
ELSE 
S    Process  all  Intervals 

NEED_PROCESS  :=  {  IMTT  :  INFT  IN  INTS(P)>; 

END  if; 

IF  FLOU_FLAG  /=  'iNrRA'  THEM 
$  Interprocedural  analysis. 

(FORALL  C  IN  :*LLSI^{P}» 

V  :=  CESSOR(C);      $  The  block  fslloriing  the  call 
PI  :=  CALLPROCO;   »  C  calls  PI 
EPl  :=  REXIKPI);    $  Tie  return  »lock  of  ^i, 
%    (Note  here  that  If  this  r3iitiie  is  uodified  to  include  parameter- 
$  oassing  assignments  as  Dart  of  call  blockst  in  the  manner  suggested 
$  in  a  concluding  remark  ii  SECTION  4f  then  oie  miyht  manioLilite 
$  AUX_F(EP1)»  which  defines  the  local  effect  of  executing  Plf  to  get 
%    F(CtV)»  rather  than  just  assign  the  first  map  to  the  second  onei  as 
i    is  done  below). 

IF  FCCC,  V])  /=  AJX_F(EP1)  THEN 

$  Update  flow  function  far  :aLL 

F([C»  V3)  :-     IF  AJX_F(EP1)  -  DM  THEN  FOM 
ELSE  AJX_-(£P1)  E^O; 

$  Interval  containing  call  ust  o;  arDcessed 

NEED_PROCE33  WITH  INTOF(C>; 

END  if; 

END  FORALL  c; 

S    If  no  intervals  need  be  processed  then  information  has 
$  stabilized  and  no  re-3P oc js ; i ig  it    ^    need  0?  done. 

IF  NEED_PROCESS  =  O  THEN  RETURN  TRUE;  END; 

END  if; 

P_INTS  :=  INTS(P);  $  Intervals  of  P  in  reverse  preorder 
OUTINT  :=  P_INTS(I»P_INTS)  ;   $  Outermost  literval 

(FORALL  INTT  :=  P_INrS<<)  ST  IHTT     IN  NEEO.PROCESS ) 

NEED_PROCESS  WITH  I  MTOF  ( I>JT  T)  ;   $  Process  containing  interval 
NODES  :=  INT_N33ES(  I  ^JTT) ;  i    M3des  of  INTT  in  interval  order 

HEAD  :=  NODES(l);   $  Interval  head 

AUX  FCHEAD)  :=  ID.   $  Initialize  AUX  -  of  HEAD  to  the  identity 
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$  Note  here  that  the  edge  [IfJTTf  HEAD]  is  a  rsal    edge  in  the 
$  flow  graph*  so  that  F{II>irr»  HEAl)])  will  ha/e  been  3re-compjted  in 
$    an  initialization  phas?*  iLji3  rfitfi  the  flo4  lass  for  all  other 
$  real  edgest  and  is  therefore  av/ailaole  here. 

$  Three  cases  are  now  aossisle: 

$  (1)  INTT  is  proper*  out  not  outermost;  then  iterate  twice. 

$  (2)  INTT  is  proper*  and  is  outermost;  then  iterate  once. 

$  (3)  INTT  is  improper;  iterats  indefinitely  CI  ♦  numaer  of 

S  nodes  is  an  adequate  jpper  bound)  until  convergence. 

$  (Note  that  we  do  not  iiake  use  of  the  better  upper  bound  on 

$  the  number  of  iterations  discussed  in  SiCTIQM  3). 

CONV_CONTROL  :=  INTT  NOTIN  PR0PER_INT3; 
$  Test  for  convergence  only  in  this  case 

N_ITER  :=        S  >4axinal  lunjer  of  iterations 

IF  INTT  NOTIN  P=(OPER_INTS  THEN  »NDOES  +  1 
ELSEIF  INTT  =  DJTIMI  THEM  1  ELSE  >  ENDt 

$  If  improper  interval*  initialize  AUX_F  of  all  non-head  nodes 
$  to  'FOM*.  This  is  because  we  cainot  guarantee  in  this  case  that 
$  when  propagating  data  to  i  i>de  .litiin  INTT*  all  its  prsdecjssors 
$  (within  INTT)  have  already  been  processed*  so  that  we  have  to 
$  prepare  for  the  case  where  some  of  these  predecessors  still 
$  have  undefined  auxiliary  jata-flow  aaps. 

IF  CONV_CONTROL  THEM 

(FOR  J  :=  2  ...  »\I30E5) 

AU)(_F(N0OE5( J)i  :=  foh; 
END  for; 
END  if; 

$  Iterate  through  lodes  of  INTT. 

(FOR  D  :=  1  ...  N_irER  JNTIL  CONVRGO) 

coNVRGO  :=  :oNi/_:DMrROL; 

$    Iterate    thru    nodes    of    INTT*    other    than    HEA3 

(FOR    J    :=    2    ...     tfMDDES) 

NO  :=  NODES(J); 
FTEMP  :=  .^lEETJDIM/ 

CF(CPND*NDD)  .COM^  AUX_F(»NO)  :  »M0  IN  PREDCNDl 
ST  INTOF(PNO)  =  IMTT>; 
PR  I  NT  (  »AJX_r(  SMO*  •)  =»*FTEMP)  ; 
CONVRGD  :=  CDNtf^Gl  AND  (FTEM^  =  AJX_F(MD))S 
AUX_F(ND)  :=  FTEMP; 

END  FOR  j; 

i    Test     if    processing    of    INTT    tias    terminated 

IF    D    =    N    ITER    3^    ZdHMRGD    THEN    QUIT    FOR    O;    END; 
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$    Re-compute    AUX_FCHEAO),    tsking    oack    edges    into    accojnt 

FTEHP    :=    .MEETJDIl^/     CFCP^JOt    HEAD])     .COMP    AUX_F(P^D)     C 
>IM3     IS    >^t.O{HEAO}    Sr    INTOF(PNO)     =    INTFi; 
%    Note    that    3    meet/join    ov/er    an    empty    set    yielJs    OH 

ftemp  :=  if  ftiip  =  01   them   aux_f(heao) 

els:  aj<_-(he\d)   •meetjoI'^  ftem-^  eno; 
if  (mot  conv_control  hem 

CONVRGD  :=  (AJX_F(HEA3>  =  FFEIP); 

ENO  if; 


AUX_F(HEAD) 
END  FOR  D; 


=  -TE*!?; 


$  Compute  FCCINTT*  V])»  nhere  V  is  a  successor  of  some  node  ^t 
$  INTT;  note  that  this  loo3  wiLL  oe  null  for  the 
$  outermost  inter>/al« 

(FORALL  V  IN  VEDGESC  I 'MT  F  }) 

FTEMP  :=  .MEETJDIM/  i=lL?\lt    Vl)  .:OHP  Ai;X_F(PV)  : 

>rf  I'M  '^EDCV}  ST  IsrOF(PV)  =  intt>; 
F(CINTT,  WJ)  :=  FTEMP  .COMP  FtLINTT,  HEAOJ); 


END  FORALL  V: 
END  FORALL  INTT; 
RETURN  false;    i    Xo    iidicate  id  conwerg«ice« 

END  PROC  intraproc   fud  -iliminate; 
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PROC  PROPAGAT£_EXPOSED('»  RU  "t  AUX_Ff  Ry  EXPOSEOt  RU  INS.RT); 

S  This  procedure  perfiris  31  I  iier-to-outer  pass  o\ier    all 

$  Intervals  to  determine  the  computations  which  might  be  mowed 

$  out  of  the  loop  of  each  intert/al  I*  As  explained  above* 

$  these  computations  are  not    i?c253arily  thos!  exaossd  in  i; 

$  hencet  we  build  up  both  ssts  •E)(^OSEO»  and  'INSERT* 

$  simultaneously* 

$  In  this  analyslSfl  th*  set  sf  consuta tions  movabl?  out  of  the 

$  loop  of  I  Is  obtained  by  taking  all  computations  T  with 

$  the  property  that  there  exists  a  node  ND  in  I  such  that 

t  T  is  exposed  in  NO  aid  iJ  awailajLa  at  the  start  of  ND  iff 

$  it  is  available  at  the  eni  of  the  target  olock  of  I. 

$  The  movable  code  is  alMa^s  assunsd  to  be  a3>end?d  to    tti? 

$  end  of  the  target  black  of  the  interval*  to  avoid  any  possible 

S  conflict  with  code  that  is  already  Dresent  In  the  target  block. 

$  However*  this  appending  takes  placs  physically  aily  at  the  ;nd 

S  of  the  elimination  phase.  Thus*  we  do  not  attempt  to  make 

$  use  of  the  fact  that  these  expressions  are  potentially 

$  available  at  the  head  of  I  i i  Jadating  any  flow  function. 

$  This  approach  is  necessary  to  ensure  convergence  of  our  algorithms 

$  in  cases  of  recursive  cycles  of  int erprocedural  flow. 

P_INTS  :=  INTS(P);  i    Iiter>ral5  3f  P  in  reverse  preorder 

%   First  extend  F  to  indicate  null  flow  from  the  entry  block  to 
$  itself.  Since  the  outermost  iiterval  has  no  target  block* 
$  and  is  therefore  identified  with  its  head*  this  trick  unifies 
$  the  treatment  of  that  interval  with  the  treatment  of  inner 
$  intervals*  as  shown  oelow. 

OUTINT  :=  P_INTS(»P_INTS); 
FCCOUTINT*  OUTINTl)  :=  ID; 

(FORALL  INTT  :=  P_INTS(<)) 

NODES  :=  INT_NOOES(INTT) ; 
HEAD  :=  NOJESd); 

$  In  computing  EXPOSEDC  INTT>  *  >ie  must  reckon  rfith  the  fact 

$  that  the  target  olock  of  IMFF  (also  denoted  by  II>iTT) 

$  might  be  non-empty*  dje  t>  jrior  cod*  motioi.  This  can  lean  that 

$  (a)  F(CINTT*  HEADJ)  is  not  the  identity*  and  Cb)  EXPOSEDt INTT> 

$  (where  INTT  is  treated  as  a  aasic  olock)  is  not  null 

S  initially. 
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$  We  proceed  as  folLo.4s:  first  find  all  exposjd  CDUoutat  1  ons  in 
$  the  loop  of  INTTf  assuming  the  target  block  of  INTT  to  be  njll. 
$  These  are  the  computations  nov/aole  out  of  the  loop  of  IMTT. 

INSERTfINTr>  :-    ♦•/  [:<^OS^OCNO>  * 

(AUX_F(MD){1)  -  AUX_F(NO) (2))  :  ND  IN  N03ES}; 

S  Next  find  the  new  set  of  comojtations  which  are  still  exposed 
$  at  the  entry  to  the  t3rg3t  >Idcc  of  INTT. 

FTARG    :=    F(CINTT»    ^-:A[)J); 

EXPFROMENTRY     :=    IV3:<rClNrr>     •     (FTAR3(l)     -    FTi\R3(2)); 

$    Add    these    computations    to    those    exposed    in    the    target    block 
EXPOSEDtlNTT}     :=    EX  '  3  3E  JCI  N  F  T  }     ♦■    EXPF  ^OMENTRY  ; 


end  forall  intt; 
return; 

END  PROC  PR0PAGATE_EXP03E3; 
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PROC  ENTRY_INFO(F,  AUX_F,  IMSERD; 

i    This  function  calculates  3ij  retjns  a  mapalng  iitiici  sends 
$  each  procedure  P  into  the  flo**  information  available  at  entry 
$  to  P.  It  is  called  (only  in  t^ie  int  e  rprocedjral  case)  just 
$  before  Me  begin  the  final  j jt ?r-t o-i nne r  propagation  ohase. 

$  First  Me  construct  a  maa  •C&F*  assigning  to  each  edge  (Pt  Q) 
$  of  the  call  graph  a  dat a- j r oa aga t i an  raap»  d?scriiing  the 
$  propagation  effect  as  control  advances  from  the  entry  of  P 
$  to  the  entry  of  Q  via  any  call  to  Q  from  P. 

CGF  :=  c>; 

(FORALL  CP»Q]  IN  CSRAPH)  CG=^(lP,Q])  :=  FO»i;  El^O; 

(FORALL  Q  I-    CALL'OC(3>)   *  -or  all  calls  Mithin  all  procedures 

P  :=  ROUTOF(C);    t  [P»  Q1  is  an  edge  of  the  call  graah 

$  Compute  the  local  effect  as  ;ontrol  advances  from  the  entry 
$  of  P  to  C. 

FTEMP  :=    AUX_F(C); 

(INIT    lU     :=     INTDFO;     J4I_£     IJ    /=    RENFRYCf)) 

HIU    :=    INT_N0DE5(IJ) (I) ;  S    head    of    lU 

$   Add    the    effect    of    code    mo^ed    out    of    lU 
FINS    :=    CID(1}»    INSERrtlJH; 

FTEMP    :=    FTEMP    .COM!*    FI^JS     .COMP    F(CIU,    HIJ3) 

.COM3    AUX_F(IJ); 

lu   :=  iNTOFdU); 
end; 
cgf(cp»  q3)   :=  cgfc't  i  j)   .meetjoin  ftemp; 

END    FORALL    Q; 

$    Next    Me    iterate   throjgh    tie    :alL     graph    in    'invocation    order»f    i.e. 
$   process    the    strongly    connected    components    in    reverse    postorder 
$   and    the    set    of    procedures    within    each    strongly    connected 
$    component    in    reverse    aostarder    also, 

ENT_INF    :=    CCP»     TOHl    :    '    IN    ROJrs>;       S    Initialize    solution 

£NT_INF<SYH_MAIN)     :=    Z.\3', 

CGRINV     :=    CCP,    QJ    :    CQf    P]    IN    CG^APH>; 
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(FOR  I  :=  2  •••  »CS_SCC3)    i    ^ick  S.C.C.'s  1i  re/ers?  ^05to^d^^. 

$  Note  that  we  assume  here  that  t^e    main  program  is  non-recurs i ve? 

(  so  that  the  first  st r^ngl f -cd in;c t ? d  compon?nt  of    the  call 

$  graph  consists  of  the  main  program  only.  Thjs  we  can  skip  itt 

$  for  the  entry  value  of  the  tain  program  is  already  assumed 

$  known* 

sec  :=  CG_sccs(i); 

SCC_PROCS    :-    S::_I\J3)-:5C  3C:)  ;     %    Procs    in    s:c    in    rev.    postorder 

(FOR  N  :=  1  ...  sc:_o(s:c)  ♦  i  until  :onvrgo) 
coNVRGD  :=  trj:; 

CFORALL  P  :=  SC:_PR3CS(K)) 

TEMP  :=  .Hj;<  CCG-(Ca,  pD)  .Or   tNT_iN-(a)  : 
a  iM  CG^iNrfC^}} ; 

$  Test  for  convergence 

coNWRGO  :=  :DMy/^G3  a:^3  (TEHp  =  ■;nt_inf(P))  ; 

ENT_INF(»)  :=  TEM'; 

END  FORALL  Pi 

END  FOR  N; 

END  FOR  i; 

RETURN  ENT_INF; 

END  PROC  ENTRY  INFO; 
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PROC    FWD_PROPAGATE_IN(Pt     RU    F,     AUX_F,    RW    SOLNt    ENT.VAL. 

M0VE_C0OEt     RU    INSERT); 

%   This    procedure    performs    ) jt ?r-to-1 nns r  propaqatlon    for    a 

$    routine    Pt    using   the    'Int erwaL-ef f ec t •  flow    functions    AUX_F 

$    to    modify    the    soljt1>n    bjd    •33L\J*.    The  parameter    EMT_»/AL 

$    gives    the    flow    Inforsatlai    assui^d    (or  known)    at    procedure 
$   entry. 

$    If    code    motion    Is    reqj1r;Jt    tiei    t\M    computations    in    INSERTCI} 

$    are    assumed    to    be    ava1lai)Le   at    the    end    of    the    target    block 

$    of    an    Interval    I    (but    only    for    the    purpose    of    propagation 

$    inside    I).    In   additi3Tf    c  juajtat  i  ons     in    IiMSERTCI}    already 

S    available    at    exit    from    the    target    block   of   I    are    removed    fram 

%    INSERTCI>. 

$    Note    that    movable    coiipjt3ti9n3    are    assumed   to    as    sjch    that    the 
$    insertion   of    any   of    them    iiill    not    •kill*    any    others. 

SOLN(RENTRYCP))     :=    EMT.rf^L'i 

P_INTS    :=    INTSCP);      t    Irjtervals    of    ^    in   reverse    preorder 

S   Extend   F    to    indicate    null    flow    from    the    entry    block    to 
$    Itself.    Since    the    ojtsrmut    iit^rval    has    no    targfft    3lock« 
$    and    is    therefore    Identified   with    its    head*    this    trick   unifies 
$    the    treatment    of    that    interval    with    the    treatment    of    inner 
$    intervals*    as    shown    3?larf. 

OUTINT     :=    P_INTS(»P_INTS); 
FCCOUTINTf    OUTINTJ)    :=    13; 

(FOR  K  :=  «P_INTSt  »P_I'4TS-1  •••  I) 

INTT  :=  P_INTS(<) ; 

NODES  :=  INT_NODES(INTT) ;   S  lodes  of  INTT 

SOLNl  :=  SOLN(INTT);   $  Oata  value  at  entry  to  INTT 
$  Convert  SOLNl  to  the  data  attriiute  value  at  the  end  af  the  target 
$  block  of  INTT. 

$  Propagate  through  the  tarjst  alock  of  INTT;  if  IMTT  =  OJTINT*  the 
S  trick  noted  above  will  males  the  following  statemsnt  a  no-op. 

SOLNl  :~  FCCINTT*  N33ES(1)3)  .OF  SOLNl ; 

S  If  code  motion  is  also  r;qjir;dt  tl»?n  updat;  INSERTCINTT> 

$  and  add  it  to  SOLNl. 

IF  MOVE_CODE  AMD  IMTT  /=  3JTINT  THEN 

INSERTCINTT}  :=  I  »JS  ERTC I MTT}  -  SO.Ni; 

SOLNl  '.-    SOLNl  *■    IMSERT{:iNTT>; 

END  if; 


10*  - 


S  Now  propagate  attributes  to  the  nodes  of  INFT 

(FORALL  U  IN  NDDES) 

SOLN{U)  :=  AJX_-(J)  .OF  SDLNi; 
END  FORALL  Ut 

END  for; 


return; 

END  PROC  FUO  PR0PA3ATE  IM; 
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PROC    INTRAPROC_FWD_ANALTSIS(Pt    RW    F,    WR    SDLNf    IO_PRMf    ZER3_PRH» 

Mr:£r_FLAG_PRM,     MOWE_CODE» 
Rd    IXPOSEOt    A\    INSIRT); 

$    This    is    the    master    routine    to    p?rforr»    a    specific    data    flow 

$    analysis    int  raproced  jrallir    for    a    giwen    routine    •*»    within    which 

S    Local    variables    are    j^nalf  zsd» 

$   For    more    details    and    commsnts    arid    description    of    parameters    see   the 
$   corresponding    i nt erpr oc edjr aL    analyser* 

ID   :=   id_prm; 

MEET_FLAG    :=    MlET_FLAG_^R«J 

Aux_F   :=  o  ; 

FLAG    :=    INTRAPROC_FUO_ELIMIMArE(Pf AUX_FtF» 'INTRAn ; 
$    The    return   value    of    that    Dpjcidirs    is    not    Jjed    i  •»    tliis    :3se 

IF  MOVE_CODE  THEN 
INSERT  :=  c> ; 
PROPAGATE.EXPOSEOC.     Ft     AJK_F»    EXPOSEDf    INSERT); 

END  if; 
SOLN  :=  {>; 

FUD_PROPAGATE_IN(Pt    Ft    AJ<_-f    30LNf    ZERO_>RMf    MOVE_COD£»    INSERT); 

return; 

END    PROC    INTRAPROC    "WD    ^^ALIfSIs; 
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PROC    INTERPROC    BACK 


.ANMf5I3{<W     Ft     UR    30L^» 
MEET    FLAG    PRM) ; 


ID    PRMt     ZERO    >RMt 


$    This    Is    the    master    rajtiri;    fyr    jjrfsriBing    a    saecifi: 

$    interprocedural    backward    data      flow    analysis.    See    the 

$    cor respono ing    forward    rojtlne    for    g?neral    comments    and 

$    description    of    paramjt»rs.     ^e''e    4?    comment    jnly    an    differences 

$    between    the    forward    and    backward    algorithmsi    which    are    as    follows: 

$    a*    Functional    composition    mjst    oe    computed    in    reverse    order, 

S   b«    The    auxiliary    maps    jsed    in    backward    analysis    are    defined   as 

$    follows:    Let    I    be    an    interwali    J    a    node    in    £     and    V    a    node    ojtside 

$    I    which    is    a    successor    of    a    nod?    in    I.    Then    AUX_"{CJ«    VD) 

$    is    defined    to    be    the    prooigation    effect    experienced    as    control 

$    advances    from    the    start    of    Jt    through      I»    to    the    start    of    V. 

$    To    compute    this    »ap    requirss    itirating    throjgh    I     in    reverse 

$    interval    order    three    times    (if    I    is    proper)    or    till    convergence 

S    otherwise. 

$  Since  the  outermost  iitervaL  of  i    procedure  P  has  no 

$  successors*  we  regard  the  olocks  REXITCP)  aid  RSTOP(P) 

$  as  its  successors*  •hiddei*  Inside  that  interval. 

$  this  is  needed  to  enaole  js  to  record  the  effect 

$  of  the  flow  through  the  ojtermost  interval  in  a  manner 

S  similar  to  that  used  for  inner  intervals. 


c.  In  backward  analysis  4S  perform  an  extra  step  after  the 

elimination  phase.  In  this  step  we  compute  an  additional  set 

•FEXIT*  of  auxiliary  «ao5.  -op  ?a:1  node  U  in  »♦  FEXir(U) 

represents  the  propagation  effect  of  the  flow  from  the  start 


of  U  to  the  return  block  of 


combined  witfi  that  of  flow  from 


the  start  of  U  to  tie  stoo  oloct  of  P. 


$  d.  In  our  backward  analysis  code  motion  issjes  are  completely 
S    ignored. 

$  e.  The  technical  proolera  concerning  endless  loops  discussed  in 
t  SECTION  5  is  assumed  to  oe  resolved  by  preliminary  processing 
$  of  the  flow  graph*  in  the  manner  suggested  there. 

$  Transfer  constant  parraeters  to  glooals 

ID  :=   id_prm; 
ZERO   :=  zero_prh; 

MEET    FLAG     :=    MEET    FLAG    'Ri; 
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$    This    master    procedure    consists    of    the    folLoriIng    four    phases; 

*  Interprocedural  elliiTatian  3i3?? 

AUX_F  :=  INTERPR0C_8AC<_ELIinArE(F)  ; 

*  Compute  auxiliary  FE<IT  maas. 

FEXIT  :=  {}; 

{FORALL  P  IN  ROUTS) 

INTRA_AUX_ELI»1I^Ar£('t  ',    !\UX_F,  FEXID; 
END  FORALL  Pi 

%   Find  data  at  procedure  exits 

EX_INF  :=  EXIT_INF3(-»  \JK_-t  "EXIT); 

$  Final  propagation  phase 

SOLN  :=  <>;      $  InitiUUe  ths  solution 
(FORALL  P  IM  ROJTS) 

BACK_PROPAGATE_IN(P,  FEXIT,  SOLN,  EX_INF(P)); 

END  forall; 

return; 

end  proc  interproc  bac<  a!«4ly5is; 
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PROC  INTERPR0C_aA[:<_iLI1I^HrE(^  J  -); 

$  This  Is  the  driver  rojt Ins  for  the  i nterprocedural  first 
$  Inner-to-outer  interi/aL  sass.  Procedures  an    analyzed  in 
$  the  folloMing  order:  'tis    d-dc?5s  th»  stronglf  comect?d 
$  components  of  the  call  graph  In  their  postorder;  thent  for 
$  such  componentf  we  iterat?  throjgh  its  procedures  in  their 
$  postorder«  no  more  tian  ?»3*l  tines*  where  D  is  the  Iood- 
$  int erconnectedness  parameter  af  the  componeit. 

AUX_F  :=  CJ;    S  initialize  auxiliary  maps 

F_P  :=  C>;   $  Pro3333tl>i  eff?:t  thru  pro:edures 


each 


$  Iterate  through  the  S.C.C.s  of  C3^A»H 
(FOR  I  :=  »CG  sees*  KC3  3:35-1  ... 


1) 


sec    :=    CG_SCCS(I);        t   set    a   s.c.c. 

SCC_PROCS    :=    SCC_I^3DE5{  SCO  ;       $    Procs    in    that     S.C.C. 

FLOW_FLAG    :=    '"IRSF    IMFE^*;    %    First    processing    of    SCC 


(FOR    J    :=    1 


2    »     SCC    O(SCC)     ♦     1    UMTIL    PROC    CONVERGE) 


PROC_CONVERGE    Z-     TRJEi 

(FOR  K  :=  »SCC_'R3C5,  »SCC_PR0C3-1  ...  I) 

p  :=  s:c_p^3:s(<); 

PROC_C0NVERiE  := 

INTRAPR3C_8AC<_ELIMINAT£(^t AUX_F,F,F_P»FL3W_FLAG) 

»^D  ^ro:_:3mv:rge; 

%   This  routine  analyzes  P.  its  fifth  parameter  indicates  whether 
$  the  analysis  is  first-tine  int erprocedural»  second-time 
$  i  nterprocedural  or  i  ntraD*  o:?  du'*  aL .  it  retj*ns  a  flag  to 
$  indicate  whether  information  has  stabilized  in  P. 

END  FOR  K; 

FLOy_FLAG  :=  •3E:C0N3_IMTER»;  $  Additional  passes  thru  SCC 
END  FOR  j; 
END  FOR  i; 
RETURN  AUX_F; 

END  PROC  interproc.back.elihimate; 
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IF  FLOU_FLAG  /=  'I^rRA*  FHill 
S  Int erprocedural  analysis. 

(FORALL  C  IN  CALLSIMO) 

V  :=  CESSORCO;     $  rhe  block  following  the  call 
PI  :=  CALLPROCO;   *  C  calls  PI 

%    (Note  that  if  this  rajtii?  is  ijclifisd  to  liclud? 

$  pararaeter-passing  asslgnraents  as  part  of  call  blockst  in  the 

S  manner  mentioned  above*  then  one  might  manipulate  F_P(P1)« 

$  the  local  effect  of  ;xecjting  'If  to  get  F<IC»V])»  rather  tian 

$  just  assign  F_P(P1)  to  FC:c»  7])t  as  is  done  below). 

IF  F(CC»  \ll}    f-     -_'(PI)  T^EN 

$   Update    flow    function    for    call 

f(cc.  wd   :=  if  f_'(p1)  =  oh  then  pom 
-:ls£   -_p(=»i)  end; 

$    Interval    containing   call    »ust    be    processed 

NEED_P^0CES5    yiTH    IIMrOF(C)J 

END  if; 

END  FORALL  C; 

$  If  no  intervals  leed  ae  3-9c?3s;i  tlien  information  has 
%    stabilized  and  no  re-processing  of  '  need  be  done. 
IF  NEED_PROCESS  =  C>  HEN  RETJRN  TRUIt  END; 

END  if; 

P_INTS  :=  INTS(P);  S  Intervals  of  P  in  re/erse  preorder 
OUTINT  :=  P_INrS(tt?_INr5) ;   $  outermost  interval 
VEDGES{OUTINT>  :=  CREXITCP)};   $  'Successors*  of  OUTINT 
IF  (SP  :=  RSTOPCP))  /=  3M  H£"4 
VEDGESCOUTINT}  rilTH  S?; 

END  if; 

(FORALL  INTT  :=  P_IMTS{<)  31  IMTT  IN  NEEO.PROCiSS) 

NEED_PROCESS  WITH  I  MTOF ( I'J  TT)  ;   $  Process  containing  interval 
NODES  :=  INT_NODES( INTT) ;  %    Nodes  of  INTT  in  interval  order 
HEAD  :=  NODESd);  \    Iiterval  head 

$  Get  successor  nodes 

CESORS  :=  VEDSISCIMrTl; 
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S    Initialize    AUX_F    for    successor    lodes 
$    subsequent    code    cons  i  der  a  j  l)^. 


This    trick    simplifies 


(FORALL    V    IN    CESORS) 
AUX_F(CV»    Vl)     := 
END    FORALL    VJ 


lOi 


$  Three  cases  are  now  possi3le: 

$  (l)  INTT  is  proper*  DJt  ut  3Jt?mi5t;  thei  iterat?  thr?;  tines. 

$  (2)  INTT  is  proper»  and  is  outermost;  then  iterate  once. 

$  (3)  INTT  is  improper;  iterate  iide f i ni t ely  (I  ♦  2»numDer  of 

$      nodes  is  an  adequate  JQo^r    30jrij)  until  coni/srg?nce»  (H?re» 

$      againf  a  better  bouid  can  be  used;  cf.  SECTION  S). 

CONV_CONTROL  :=  INTT  >J3riM  PR3PER_INT3; 
*  Test  for  conwergsnc?  siL^  for  iiDTDDer  intervals. 


IF  INTT  NOTIN 
ELSEIF  INTT  =  3JTIMT 


of  iterations  thrj  nodes  at  INTT 


N_ITER  :=   $  Maximal  lumber 

^^D^I^.I-JTS  THEN  1  ••  2  »  »NDO 
THE-^  1  ELSE  5  END; 


(FORALL    NO    IN    NDOESi 
AUX_F{CNO»     I/])     : 

END  forall; 


\l    IN    CCSDRS    ST    ND    /=    M ) 
FDM;  $    Initialise    ajxiliary    maps 


$    Iterate    through   nodes    of    MTT. 

(FOR    D     :=    1     ...    N_irER    JNTIL    CONVRGO) 

CONURGD     :=    C0NV_C0r4TRDL; 

$    Iterate    thru    nodes    af    INTT    \t    r?«/erse    inter/al    ard?-. 

(FOR  J   :=  »nooe:s»   w'^odes-i   ...  i) 

NO    :=    ^33ES(J>; 

(FORALL    V    IM     CESORS    ST     V    /=    N3) 


%   Since    the    •successors*     of    the    ojtermost    intsrval    are    nodes 
$   of    that    Interval*    we    may    i  a  v?    ND    =    i/ .    In    this    case 
$    It    would    be    erroneouss    to    comoute    A1JX_F(CN0»    \/])    (which    has 
$    already    been    set    to    10)     Jjinq    tie    following    'prooagation 
$   from    successors*    for-nula*    so   w?    just    skip   sjch    cases. 

FTEHP     :=    .MEETJOIN/ 

{:F(:M3,     SM3])     .COMP    AJX_F(CSN3*     V])     I 
SND    IN    CESSOR{NO>    ST 
IMT3F(SN0)     -    INTT    OR    SNiD    =    Mil 
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$    Note    that    flow    graph    edges    (virtjal    or    real)    are    either 
$   edges    within    an    interi/al»     linking   trfo    nodes    in    the    same 
$    interval*    or    edges    gairig    >jt    yf    an    interwaLt    or    edges    g3ir»g 
$    into    an    interval    (these    Last    ed^Jes    are    edges    from    (a    target 
$    block    of)    an    interval    to    its    head.    It    is    this    third    kind    of 
$    edge    that    we    wish    to    a\iaii    >r33j^3ting    throjgh    in    ttie    aoove 
$    formula.     •INTOF(SND)    =    IMTT»    tests    for    internal    edges 
$    and    »SND    =    V»    tests    f3r    ojtgoing    edges    whos?    target    is    W. 

IF    FFEM'     =    OH    THEN    FTEHP    :=    FOMi     END! 
CO^iVRGO    :=    CONtfRGD    ANO 

(-TIM^    =    4UX_F(CMDt    VD); 

AJ<_F(:  M0»  \ii}    :=   FTEMp; 

END    FORALL    \l  i 

END    FOR    j; 

END    FOR    D; 
$    (Note    that    no    special    handling    af    I>lTr»s    head    is    required.) 

$    Except    for    the    outermost    intervalf    compute    - ILIHTJ t    VJ)t    wh?re    V 
$    is    a    successor    of    some    node    in    INTT. 

IF    INTT    /=    OUTINT    HEM 
$   F(CINTT,    W])    is    trivially    calculated    in    this    case;    we    also 
$   remove    the   dummy    AUK_F(Ii/»    »/])    entries. 

(FORALL    V    IN    CESORS) 

FTEMP     :=    F(:iMTr»iEA33)     .COMP    AUX_F (C HEAD, V 3) ; 

F([iNTr.k/])    :=  -nip; 

AUX_F(CVt    »/])    :=    3M;    j    To    remDVe    this    entry    from    AUX_F 
END    FORALL    W; 

END  if; 

END    FORALL    INTTi 

$   Compute    F_P(P) 

F_P(P)     :=    AUX_F(i:HEAOt    <E<ir(')]);     S    head    -    RENTRy(p) 

IF    RSTOP(P)    /=    OM    THEN         $    If    P    contains    a    stoo    blockt    calculate 
$   Dropagatijn    effect    to    that    jIocIc    and    coB^in? 
S    it    with    •normal*     flow    effect. 

FZERO    :=    AUX_F(C4£i\3t    RSTD'(P)])     .OF    ?lR0; 

F_P(P)     :=    F_P(')     .>iEErJ3IM    CFZEROt    FZIROi; 

$   Note    that    a    constant    function    C    is    represented    oy    CC,    C] 

END   if; 

VEDGESCOUTINTl  :=  €>;    $  Remove  artificial  edges  added  earlier 
RETURN  false;    $  To  indicate  no  convergence 

END  PROC  INTRAPROC  BACK  ELIMIMATE; 
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PROC  INTRA_AUX_t:LI^IMAr  :('»  F,  !VJX_Ft  RU  "EXIT); 

S  This  procedure  performs  an    additional  int raar oceduraL 
$  eliminationt  during  which  w?  computet  for  each  node  H    in  Pf 
$  a  map  FEXIT(N)  represent  i  i  g  tie  effect  of  flow  fr3'B  tie 
$  start  of  N  up  to  an  exit  of  ?• 

P_INTS  :=  INTS(P); 

OUTINT  :=  P_INTS(B?_INr5); 

EP  :=  REXiT(P); 
sp  :=  RSTOP(P); 

$  First  process  nodes  of  OUTINT 

OUTNOOES  :=  INT_ND0ESC3JTl!^r  ); 
(FORALL  NO  :=  OUTNODES(I)) 

FEXIT(NO)  :=  AUX_F(:igD.  E^D);  t  Get  the  effect  of  flo-i  to  EP 
IF  SP  /=  OM  THEN   $  If  th?r»  is  also  i  stop 

FZERO  :=  ajx_f(i:nd»  SP])  .OF  zero; 

FTEMP  :-  I-  -z:o  =  X3H  THEN  FOH  :lse  : - z:ro»fze.<31  end; 

FEXIT(ND)  :=  FEXIT(ND)  .MEETJOIN  ^TEMP; 

END  if; 

END  FORALL  NO; 

%    Next  process  all  remaining  intert/als  in  out  er-to- inner  order 
(FOR  J  :=  »P_INTS-I»  »^_IMT5->  ...  1) 

INTT  :=  P_INTS(J); 
CESORS  :=  VEDSESCl^TT}; 

NODES  :=  INT_N0DES{INTT); 
(FORALL  NO  :=  N0DE3(<)) 

FEXIT(ND)  :=  •MIETJOI'^  / 

CAJX_F(:ND,  VD)  .COMP  FEXIT(V)  :  V  IN  CES3RS}; 

END  FORALL  ND; 

END  FOR  j; 

return; 

end  proc  intra  aux  eliminate; 
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PROC  EXIT_INFOCF,  MJK_-,  "KID; 

$  This  function  calculates  arid  retjrns  a  mapping  which  sends 
$  each  procedure  P  into  th?  flow  information  available  at  exit 
$  from  P.  It  is  callej  io^lf    ii  ti?  i i terproc jdural  case)  Just 
$  before  we  begin  the  final  out er-to-i nner  propagation  phase* 

$  First  we  construct  a  maa  »:5-»  assijning*  t>  eaci  sdgs  (Pf  3) 
$  of  the  call  graph*  a  data-aropaga ti on  map  describing  the 
$  propagation  effect  as  control  returns  from  the  exit  of  Q  to  P 
$  after  any  call  in  P  to  Q»  a  r»d  then  advances  ta  the  sxit  of  >• 

CGF  :=  t>; 

(FORALL  CP.Q]  IN  CG<A'^)    c3-(:>,ai)    :=  Fo>4;   emd; 

(FORALL  Q  :=  CALL»^OC(C>)   $  For  all  calls  within  all  procedures 

P  :=  ROUTOFCO;  $  CP»  QJ  is  an  edge  of  the  call  gra^h 
CI  :=  CESSORO;  %  Cl  is  the  block  inmediately  after  C 
CGF(CP«  Q])  :=  CSFCP*  33)  .MEETJOIN  -EXIT(Cl); 

$  Mote  that  since  we  are  d?aling  with  a  backward  anal/sis*  we 

t  want  to  propagate  data  frjw  tt»e  exit  of  the  calling  procedure  P 

$  to  the  exit  of  the  called  procedure  Q.  This  direction  of 

$  propagation*  however*  makjs  our  3ro3lem  a  forward  problem 

$  for  the  call  gra^h. 

END  FORALL  Q* 

$  Next  we  iterate  throjgh  tie  call  graph  in  • i nvoc a t 1 di  3rd?r»»  i.e. 
$  process  the  strongly  connected  components  in  reverse  postorder 
$  and  the  set  of  procedures  withiri  each  strongly  connected 
$  component  in  revers?  30st>rd?r  also. 

EX_INF  :=  {  CPt  XOMJ  :  P  IN  ROJrs  };   $  Iiitiallze  solution 

EX_INF{SYM_MAI^)  :=  'I^d; 

C6RINV  :=  tCP*  Q]  :  C9*  P]  IN  CGRAPH>; 

(FOR  I  :=  2  ...  »CG  SCC5)    S  Pick  S.C.C.»s  in  reverse  postorder 


sec  :=  cG_sccs(i); 

sec  PROCS  :=  sec  NODESOC:);  S  Procs  in  sec  in  rev.  Postorder 
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(FOR  N  :=  1  ...  5CC_D(5:C)  ♦  1  UNTIL  :ONV^GD) 

coNVRGD  :=  TRu:; 

(FORALL  P  :=  S::_P^OCi(K)) 

TEMP  :=    .MJU/  CCG-(CQ.  P])  .0-  EX.INFO)  : 

$  Test  for  convergence 

CONVRGD  :=  COMVRGJ  ANO  (TEMP  =  EX_INF(P)); 

Ex_iN-(^)  ::  t:>4p; 

END  FORALL  Pi 
END  FOR  N; 
END  FOR  i; 
RETURN  EX_INF; 
END  PROC  EXIT  INFO; 
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PROC  BACK_PROPAGATE_IN(=»t  FEXITt  RW  SOLNt  EX_VAL); 

$  This  procedure  performs    s  j  t  er-t  3- i  ner  back  oro33G|3t  1  on  fDP 
$  a  routine  P»  using  the  'FEXIT*  information.  £X_\/AL  is  the  flow 
$  information  assumed  (or  ki0i«n)  at  the  procedure  retjrn  olockf 
*  where  •ZERO*  is  always  asjJBsi  it  tie  stop  >lock  of  P  (3jt  this 
$  assumption  has  already  b?er»  jsed  in  calculating  the  FEXIT  maps). 

CFORALL  INTT  IM  INTiOt  J  l^i    I  ^f  _NODES(I  \l  TT ) ) 

SOLNCU)  :=  FEXIT(U)  .OF  EX_VA_  ; 

END  forall; 


return; 

END  PROC  BACK  PROP^SATI  I M; 
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PROC  INTRAPROC  BAC<  ANALrSIi(^»  \J     Ft  WR  >OLN»  IO_PRM,  ZERO_PRM, 

MEET_FLAG_PRM) ; 

$  This  is  the  master  PDJtii?  to    pjrfarm  a  sps:ific  Dackward  data  flow 

$  analysis  int  raprocedjralljr  for  a  routine  P  t^hose  local 

$  variables  are    to  be  analyzed.  For  more  details*  comments*  and 

$  description  of  parautirs  s;?  ti;  c arresponi ing  i nt erproc ?djral 

$  analyser, 

10  :=  ID_PRMi 

ZERO  :=  zero_prm; 

MEET_FLAG  :=    ME ET_FLAG_'RH: 

AUx_F  :=  F_p  :=  {}; 

FLAG  :=  INTRAPROC.BACK.ELIMINAT^CP*  AJX_F,  F,  F_Pf  'INrRA'); 
$  Flag  is  not  used  in  this  rase 

FExiT  :=  o; 

INTRA    AUX    ELIMINATE(P»     "»    AJX    F,     FEXIT)*, 


soLN  :=  C}; 

BACK_PROPAGATE_IN(P,  FEKiTt  S3LI^f  ZERO); 

$  Note  that  in  the  i nt rapro: edjral  case  the  last  two  procedures 

*  can  be  combined  to  fam  a  single  pracedure  almost  identical 

$  with  •INTRA_AUX_ELIMINATE» ♦  except  that  this  procedure  computes  the 

S  'SOLN*  map  directly  instead  of  the  'FEXIT*  aaps. 

return; 

end  proc  intraproc  bac<  ft'^alysis; 
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$  Here  are  the  operators  which  uanipjlate  the  data  propagatioi  maps 
$  aid  data  states. 

OP  .COMPCet  F)t    *  Fjnctional  composition  G  of  F 

RETURN 

IF  F  =  FOM  OR  S  =  FDM  THEM  FOM 

ELSE  CF(1)  *  G(l)  ♦  3(2).  "(2)  *  G(l)  ♦  G(2)D 

end; 
end  op  .comp; 


OP  •MEETJOINO.  F)i    %    -jirtional  meet  o-  joIt 
RETURN 

IF  F  =  FOM  THEM  S 

ELSEIF  G  =  FOM  THEM  F 

ELSEIF  MEET_FLAG  THIN  C-Cl)  •  G(l)i  F(2)  •  G(2)3 

ELSE  CF(1)  +  G(l),  F(2)  ♦•  3(2)] 

end; 
END  OP  .meetjoin; 


OP  .MJV(X»  Y);    $  Msfft  jr    ioii  Df  tattle;  elsaents 

RETURN 

IF  X  =  XOH  THEM  Y 
ELSEIF  Y  =  XOM  THEM  < 
ELSEIF  MEET_FLAG  TH:M  X  *  Y 
ELSE  X  ♦  Y 

end; 
end  op  .mjv; 


OP  •OF(F»  X);    S  -unctionaL  aopLlcatlon 

return  if  x  =  xom  or  f  =   fom  then  xom 
else  f(i)*x   ♦  f(2) 

end; 
END  OP   .of; 


end  module  setl  optimi?:^  -  oivta-^ou  solv:r; 
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